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Title: SH2-CONTAINING INOSITOL-PHOSPHATASE 

HELP OF THE INVENTION 

The invention relates to a novel SH2-containing inositol-phosphatase, truncations, 
analogs, homologs and isoforms thereof; nucleic acid molecules encoding the protein and 
5 truncations, analogs, and homologs of the protein; and, uses of the protein and nucleic acid 
molecules. 

PACKCROVNP OF THE INVENTION 

Many growth factors regulate the proliferative, differentiative and metabolic 
activities of their target cells by binding to, and activating cell surface receptors that have 

10 tyrosine kinase activity (Cantiey, L.C., et al. 1991, Cell 64:281-302; and Ullrich, A., and J. 
Schlessinger. 1990, Cell 61:203-212). The activated receptors become tyrosine phosphorylated 
through intermolecular autophosphorylation events, and then stimulate intracellular 
signalling pathways by binding to, and phosphorylating cytoplasmic signalling proteins 
(Cantiey, L.C, et al. 1991, Cell 64:281-302; and, Ullrich, A., and J. Schlessinger, 1990, Cell 

15 61:203-212). Many cytoplasmic signalling proteins share a common structural motif, known as 
the src homology 2 (SH2) domain, that mediates their association with specific 
phosphotyrosine-containing sites on activated receptors (Heldin, C.H. 1991, Trends Biochem. 
Sci. 16:450-452; Koch, C.A., et al, 1991, Science 252:669-674; Margolis, B. 1992, Cell Growth 
Differ. 3:73-80; McGlade, C.J., et al, 1992, Mol. Cell. Biol. 12: 991-997; Moran, MR, et al., 1990, 

20 Proc. Natl. Acad. Sci. USA 87:8622-8626; and Reedijk, M., et al., 1992, EMBO J. 11:1365-1372). 

Two SH2-containing proteins, Grb2 and She, have been implicated in the Ras 
signalling pathway (Lowenstein, E.J.,et al.,1992, Cell 70:431-442, and, Pelicci, G., et al., 1992, 
Cell 70 93-104.). Grb2 and She act upstream of Ras and bind directly to activated receptors 
(Buday, L, and J. Downward, 1993, Cell 73:611-620; Matuoka, K. et al, 1993, EMBO J. 12:3467- 

25 3473, Oakley, B.R. et al., 1980, Anal. Biochem. 105:361-363., Reedijk, M, et al., 1992, EMBO J. 
11:1365-1372; Rozakis-Adcock, M.,et al., 1992 Nature 360: 689-692; and, Songyang, Z.,et al., 
1993, Cell 72:767-778), or to designated SH2 docking proteins, such as the insulin receptor 
substrate 1 (IRS-1), which is tyrosine phosphorylated in response to insulin (Baltensperger, K., 
et al., Science 260:1950-1952; Pelicci, G., et al., 1992, Cell 70:93-104; Skolnik, E.Y., 1993, EMBO 

30 J. 12:1929-1936; Skolnik, E.Y., et al., 1993, Science 260:1953-1955; and Suen, K-L., et al., 1993 
Mol. Cell. Biol. 13: 5500-5512). 

Grb2 is a 25 kDa adapter protein with two SH3 domains flanking one SH2 domain. It 
* . has been shown in fibroblasts to shuttle its constitutively bound Ras guanine nucleotide 

exchange factor, Sosl, to activated receptors (or to IRS-1 (Skolnik, E.Y., 1993, EMBO J. 12:1929- 

35 1936; and Skolnik, E.Y., et al., 1993, Science 260:1953-1955), (Baltensperger, K., et al., Science 
260:1950-1952; Buday, L., and J. Downward, 1993, Cell 73:611-620; Egan, S.E. et al., 1993, 
Nature (London) 367:87-90; Gale, N.W., et al., 1993, Nature (London) 363:88-92; Li, N., et al., 
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1993. Nature (London) 363-85-88; Olivier, J.P. et al., 1993, Cell 73:179-191; and Rozakis- 
Adcock, M., et al, 1993 Nature (London) 363:83-85). Binding of the SH2 domain of Grb2 to 
tyrosine phosphorylated proteins activates Sosl which then catalyzes the activation of Ras 
by exchanging GDP for GTP (Buday, L, and J. Downward. 1993. Cell 73:611-620 12„20; Egan, 
S.E. Et al, 1993, Nature 363:45-51; Gale, N.W et al, 1993 Nature 363:88-92; Li, N., et al, 1993 
Nature 363:85-88). 

She is also an adapter protein that is widely expressed in all tissues. The protein 
contains an N-terminal phosphotyrosine binding (PTB) domain (Kavanaugh, V.M. Et al., 1995 
Science, 268:1177-1179; Craparo, A., et al., 1995, J. Biol. Chem. 270:15639-15643; van der Geer, 
P., & Pawson, T., 1995, TIBS 20:277-280; Batzer, A.G., et al., Mol. Cell. Biol. 1995, 15:4403-4409; 
and Trub, T., et al., 1995, J. Biol. Chem. 270:18205-18208) and a C-terminal SH2 domain 
(Pelicci, G., et al., 1992. Cell 70:93-104) and can associate, in its tyrosine phosphorylated form, 
with Grb2-Sosl complexes and may increase Grb2-Sosl interactions following growth factor 
stimulation (Egan, S.E. Et al, 1993, Nature 363:45-51 ;Rozakis-Adcock, M., et al., 1992, Nature 
15 360:689-692; and Ravichandran, K.S., 1995, Mol. Cell. Biol. 15:593-600). She appears to 
function as a bridge between Grb2-Sosl complexes and tyrosine kinases where the latter are 
incapable, for lack of an appropriate consensus sequence, of binding Grb2-Sosl directly (Egan, 
S.E. Et al, 1993, Nature 363:45-51). 

Preliminary evidence suggests that She and Grb2 may be used by members of the 
hemopoietin receptor superfamily (Cutler, R.L., et al., 1993, J. Biol. Chem. 268:21463-21465, 
Damen, J.E.,et al., 1993, Blood 82:2296-2303). Although members of this family lack 
endogenous kinase activity, following ligand binding, they are apparently tyrosine 
phosphorylated by a closely associated JAK family member (Argetsinger, L.S., et al., 1993, 
Cell 74:237-244; Lutticken, C, et al., 1994, Science 263:89-92; Silvennoinen, O., et al., 1993, 
Proc. Natl. Acad. Sci. USA 90:8429-8433; and Witthuhn, B.A., et al., 1993, Cell 74:227-236). 
The hemopoietic growth factors, erythropoietin (Ep), interleukin-3 (IL-3) and steel factor (SF) 
(which utilizes a receptor with endogenous tyrosine kinase activity, i.e., c-kit,(Chabot, B., et 
al., 1988, Nature (London) 335:88-89)), have been shown to induce the tyrosine 
phosphorylation of She and its subsequent association with Grb2 (Cutler, R.L., et al., 1993, J. 
Biol. Chem. 268:21463-21465). Stimulation of members of the hemopoietin receptor 
superfamily has also been reported to result in the association of She with uncharacterized 
proteins with molecular masses of 130 kDa (Smit, L., et al., J. of Biol. Chem. 269(32):20209, 
1994), 150 kDa (Lioubin, M.N., et al., Mol. Cell. Biol. 14(9):5682, 1994), and 145 kDa (Damen, 
J., et al., Blood 82(8):2296, 1993, and Saxton, T.M. et al.J. Immunol. 623, 1994). 
SUMMARY OF THE INVENTION 

The present inventor has identified and characterized a protein that associates with 
She in response to multiple cytokines. The unique protein, herein referred to as "SH2- 
containing inositol-phosphatase" or "SHIP" (for SH>containing, inositol 5-phosphatase), 
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contains an amino terminal src homology 2 (SH2) domain, two phosphotyrosine binding (PTB) 
consensus sequences, a proline rich region, and two motifs highly conserved among inositol 
polyphosphate-5-phosphatases (phosphoIns-5-ptases). Cell lysates immunoprecipitated 
with antiserum to the protein exhibit phospholns-5-ptase activity, in particular, both 
5 phosphatidylinositol trisphosphate (PtdIns-3,4,5-P3) and inositol tetraphosphate (1ns- 
l / 3,4 / 5-p4> 5-phosphatase activity. This activity implicates SHIP in the regulation of 
signalling pathways that control gene expression, cell proliferation, differentiation, 
activation, and metabolism, iii particular, the Ras and phospholipid signalling pathways. 
This finding permits the identification of substances which affect SHIP and which may be 

10 used in the treatment of conditions involving perturbation of signalling pathways. 

The present invention therefore provides a purified and isolated nucleic acid molecule 
comprising a sequence encoding an SH2-containing inositol-phosphatase which has a src 
homology 2 (SH2) domain and exhibits phosphoIns-5-ptase activity. The SH2-containing 
inositol-phosphatase is further characterized by it ability to associate with She and by 

15 having two phosphotyrosine binding (PTB) consensus sequences, a proline rich region, and 
motifs highly conserved among inositol polyphosphate-5-phosphatases (phosphoIns-5- 
ptases). 

In an embodiment of the invention, the purified and isolated nucleic acid molecule 
comprises (i) a nucleic acid sequence encoding an SH2-containing inositol-phosphatase having 

20 the amino acid sequence as shown in SEQ ID NO:2 or Figure 2 (A); and, (ii) nucleic acid 
sequences complementary to (i). In another embodiment of the invention, the purified and 
isolated nucleic acid molecule comprises (i) a nucleic acid sequence encoding an SH2-containing 
inositol-phosphatase having the amino acid sequence as shown in SEQ ID NO:8 or Figure 11; 
and, (ii) nucleic acid sequences complementary to (i). 

25 In a preferred embodiment of the invention, the purified and isolated nucleic acid 

molecule comprises 

(i) a nucleic acid sequence encoding an SH2-containing inositol-phosphatase having 
the nucleic acid sequence as shown in SEQ ID NO:l or Figure 3, wherein T can also be U; 

(ii) a nucleic acid sequence complementary to (i), preferably complementary to the full 
30 length nucleic acid sequence shown in SEQ ID NO: 1 or Figure 3; or 

(iii) a nucleic acid molecule differing from any of the nucleic acids of (i) and (ii) in 
codon sequences due to the degeneracy of the genetic code. 

In another preferred embodiment of the invention, the purified and isolated nucleic 
acid molecule comprises 

35 (i) a nucleic acid sequence encoding an SH2-containing inositol-phosphatase having 

the nucleic acid sequence as shown in SEQ ID NO:7 or Figure 10, wherein T can also be U; 

(ii) a nucleic acid sequence complementary to (i), preferably complementary to the full 
length nucleic acid sequence shown in SEQ ID NO: 7 or Figure 10; 
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(iii) a nucleic acid molecule differing from any of the nucleic acids of (i) and (ii) in 
codon sequences due to the degeneracy of the genetic code. 

The invention also contemplates (a) a nucleic acid molecule comprising a sequence 
encoding a truncation of the SH2-containing inositol-phosphatase, an analog or homolog of the 
5 SH2-containing inositol-phosphatase or a truncation thereof, (herein collectively referred to 
as "SHIP related protein" or "SHIP related proteins"); (b) a nucleic acid molecule comprising a 
sequence which hybridizes under high stringency conditions to the nucleic acid encoded by a 
SH2-containing inositol-phosphatase having the amino acid sequence as shown in SEQ ID 
NO:2 or Figure 2 (A), or SEQ ID NO:8 or Figure 11, wherein T can also be U, or complementary 
10 sequences thereto, or by a SHIP related protein; and (c) a nucleic acid molecule comprising a 
sequence which hybridizes under high stringency conditions to the nucleic acid encoded by the 
SH2-containing inositol-phosphatase having the nucleic acid sequence as shown in SEQ ID 
NO:l or Figure 3, or SEQ ID NO:7 or Figure 10, wherein T can also be U, or complementary 
sequences thereto. 

15 The invention further contemplates a purified and isolated double stranded nucleic 

acid molecule containing a nucleic acid molecule of the invention, hydrogen bonded to a 
complementary nucleic acid base sequence. 

The nucleic acid molecules of the invention may be inserted into an appropriate 
expression vector, i.e. a vector which contains the necessary elements for the transcription and 

20 translation of the inserted coding sequence. Accordingly, recombinant expression vectors 
adapted for transformation of a host cell may be constructed which comprise a nucleic acid 
molecule of the invention and one or more transcription and translation elements operatively 
linked to the nucleic acid molecule. 

The recombinant expression vector can be used to prepare transformed host cells 

25 expressing SH2-containing inositol-phosphatase or a SHIP related protein. Therefore, the 
invention further provides host cells containing a recombinant molecule of the invention. The 
invention also contemplates transgenic non-human mammals whose germ cells and somatic cells 
contain a recombinant molecule comprising a nucleic acid molecule of the invention which 
encodes an analog of SH2-containing inositol-phosphatase, i.e. the protein with an insertion, 

30 substitution or deletion mutation. 

The invention further provides a method for preparing a novel SH2-containing 
inositol-phosphatase, or a SHIP related protein utilizing the purified and isolated nucleic 
acid molecules of the invention. In an embodiment a method for preparing an SH2-containing 
inositol-phosphatase or a SHIP related protein is provided comprising (a) transferring a 

35 recombinant expression vector of the invention into a host cell; (b) selecting transformed host 
cells from untransformed host cells; (c) culturing a selected transformed host cell under 
conditions which allow expression of the SH2-containing inositol-phosphatase or SHIP 
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related protein; and (d) isolating the SH2-containing inositol-phosphatase or SHIP related 
protein. 

The invention further broadly contemplates a purified and isolated SH2-containing 
inositol-phosphatase which contains an SH2 domain and which exhibits phospholns-5-ptase 
5 activity. In an embodiment of the invention, a purified SH2-containing inositol-phosphatase 
is provided which has the amino acid sequence as shown in SEQ ID NO:2 or Figure 2 (A). In 
another embodiment of the invention, a purified SH2-containing inositol-phosphatase is 
provided which has the amino acid sequence as shown in SEQ ID NO:8 or Figure 11. The 
purified and isolated protein of the invention may be activated i.e. phosphorylated. The 
10 invention also includes truncations of the protein and analogs, homologs, and isoforms of the 
protein and truncations thereof (i.e. "SHIP related proteins"). 

The SH2-containing inositol-phosphatase or SHIP related proteins of the invention 
may be conjugated with other molecules, such as proteins to prepare fusion proteins. This may 
be accomplished, for example, by the synthesis of N-terminal or C-terminal fusion proteins. 
15 The invention further contemplates antibodies having specificity against an epitope 

of SH2-containing inositol-phosphatase or a SHIP related protein of the invention. 
Antibodies may be labelled with a detectable substance and they may be used to detect the 
SH2-containing inositol-phosphatase or a SHIP related protein of the invention in tissues and 
cells. 

20 The invention also permits the construction of nucleotide probes which are unique to 

the nucleic acid molecules of the invention and accordingly to SHIP or a SHIP related protein 
of the invention. Thus, the invention also relates to a probe comprising a sequence encoding 
SH2-containing inositol-phosphatase or an SHIP related protein. The probe may be labelled, 
for example, with a detectable substance and it may be used to select from a mixture of 

25 nucleotide sequences a nucleotide sequence coding for a protein which displays one or more of 
the properties of SHIP. 

The invention still further provides a method for identifying a substance which is 
capable of binding to SHIP, or a SHIP related protein or an activated form thereof, comprising 
reacting SHIP, or a SHIP related protein, or an activated form thereof, with at least one 

30 substance which potentially can bind with SHIP, or a SHIP related protein or an activated 
form thereof, under conditions which permit the formation of complexes between the substance 
and SHIP or SHIP related protein or an activated form thereof, and assaying for complexes, for 
free substance, for non-complexed SHIP or SHIP related protein or an activated form thereof, or 
for activation of SHIP. 

35 Still further, the invention provides a method for assaying a medium for the presence 

of an agonist or antagonist of the interaction of SHIP, or a SHIP related protein or an activated 
form thereof, and a substance which binds to SHIP, a SHIP related protein or an activated 
form thereof. In an embodiment, the method comprises providing a known concentration of 
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SHIP, or a SHIP related protein, with a substance which is capable of binding to SHIP, or 
SHIP related protein and a test substance under conditions which permit the formation of 
complexes between the substance and SHIP, or SHIP related protein, and assaying for 
complexes, for free substance, for non-complexed SHIP or SHIP related protein, or for 
activation of SHIP, or SHIP related protein. In a preferred embodiment of the invention, the 
substance is She or a part thereof, or an SH3-containing protein or part thereof. 

Still further the invention contemplates a method for assaying for the affect of a 
substance on the phosphoIns-5-ptase activity of SHIP or a SHIP related protein having 
phosphoIns-5-ptase activity comprising reacting a substrate which is capable of being 
hydrolyzed by SHIP or a SHIP related protein to produce a hydrolysis product, with a test 
substance under conditions which permit the hydrolysis of the substrate, determining the 
amount of hydrolysis product, and comparing the amount of hydrolysis product obtained with 
the amount obtained in the absence of the substance to determine the affect of the substance on 
the phospholns-5-ptase activity of SHIP or the SHIP related protein. 
15 Substances which affect SHIP or a SHIP related protein may also be identified using 

the methods of the invention by comparing the pattern and level of expression of SHIP or a 
SHIP related protein of the invention in tissues and cells in the presence, and in the absence of 
the substance. 

The substances identified using the method of the invention may be used in the 
20 treatment of conditions involving the perturbation of signalling pathways, and in particular in 
the treatment of proliferative disorders. Accordingly, the substances may be formulated into 
pharmaceutical compositions for adminstration to individuals suffering from one of these 
conditions. 

Other objects, features and advantages of the present invention will become apparent 
25 from the following detailed description. It should be understood, however, that the detailed 
description and the specific examples while indicating preferred embodiments of the invention 
are given by way of illustration only, since various changes and modifications within the 
spirit and scope of the invention will become apparent to those skilled in the art from this 
detailed description. 
30 DESCRIPTION OF THE DRAWINGS 

The invention will be better understood with reference to the drawings in which: 
Figure 1 are immunoblots showing lysates prepared from B6SUtAi cells, treated ± IL- 
3, immunoprecipitated with anti-She, followed by protein A Sepharose (lanes 1&2) or 
incubated with GSH bead bound GST-N-SH3 (lanes 3&4) or GSH bead bound GST-C-SH3 
35 (lanes 5&6); 

Figure 2 shows the amino acid sequence of murine SHIP (A) and a schematic diagram 
of the domains of the novel protein of the invention (B); 

Figure 3 shows the nucleic acid sequence of murine SHIP; 
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Figure 4 shows immunoblots of lysates from B6SUtA! cells, treated ± IL-3, 
immunoprecipitated with anti-She (lanes 1&2), NRS (lanes 3&4) or anti-15mer (lanes 5&6) or 
precleared with anti-15™r and then immunoprecipitated with anti-She (lanes 7&8) (A); and 
lysates from BeSUtA! cells, stimulated with IL-3, immunoprecipitated with anti-She (lane 1) 
or anti-15roer (lane 2) and bound proteins eluted with SDS-sample buffer containing N- 
ethylmaleimide in lieu of 2-mercaptoethanol (B); 

Figure 5 shows Northern blot analysis of 2 ug of polyA RNA from various tissues 
probed with a random primer-labeled PCR fragment encompassing a 1.5-kb fragment 
corresponding to the 3' end of the pl45 cDNA (lanes l-6 f spleen, lung, liver, skeletal muscle, 
kidney and testes, respectively (Clontech); lane 7, separately prepared blot of bone marrow; 

Figure 6 is a graph showing the results of anti-lS™*', anti-She and NRS 
immunoprecipitates with B6SUtAi cell lysate incubated with PH]lns-l,3,4,5-P 4 under 
conditions where product formation was linear with time (A); and shows immunoblots of anti- 
lSmer, nrs and anti-She immunoprecipitates (as well as ± recombinant 5-ptase II, ie. PtII&BL 
(blank)) incubated with Ptdlns[32p].3,4,5-P 3 under conditions where product formation was 
linear with time and the reaction mixture chromatographed on TLC(B); 

Figure 7 shows the amino acid sequence of She- 
Figure 8 shows the nucleic acid sequence of She; 

Figure 9 shows the amino acid and nucleic acid sequences of Grb2; 

Figure 10 shows the nucleic acid sequence of human SHIP; 

Figure 11 shows the amino acid sequence of human SHIP; 

Figure 12 shows a comparison of the amino acid sequences of human and murine SHIP; 

and 

Figure 13 shows a comparison of the nucleic acid sequences of human and murine SHIP. 
DETAILED DESCRIPTION OF THE INVFMTin N 

The following standard abbreviations for the amino acid residues are used throughout 
the specification: A, Ala - alanine; C, Cys - cysteine; D, Asp- aspartic acid; E, Glu - glutamic 
acid; F, Phe - phenylalanine; G, Gly - glycine; H, His - histidine; I, He - isoleucine; K, Lys - 
lysine; L, Leu - leucine; M, Met - methionine; N, Asn - asparagine; P, Pro - proline; Q, Gin - 
glutamine; R, Arg - arginine; S, Ser - serine; T, Thr - threonine; V, Val - valine; W, Trp- 
tryptophan; Y, Tyr - tyrosine; and p.Y., PTyr - phosphotyrosine. 
L Nucleic Acid Molecules of the Invention 

As hereinbefore mentioned, the invention provides an isolated and purified nucleic 
acid molecule having a sequence encoding an SH2-containing inositol-phosphatase (SHIP) 
which contains an SH2 domain and exhibits phosphoIns-5-ptase activity. The term "isolated 
and purified" refers to a nucleic acid substantially free of cellular material or culture medium 
when produced by recombinant DNA techniques, or chemical precursors, or other chemicals 
when chemically synthesized. An "isolated and purified" nucleic acid is also substantially 
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free of sequences which naturally flank the nucleic acid (i.e., sequences located at the 5' and 3" 
ends of the nucleic acid) from which the nucleic acid is derived. The term "nucleic acid" is 
intended to include DNA and RNA and can be either double stranded or single stranded. 

The murine SHIP coding region was cloned by purifying the protein based on Grb2-C- 
SH3 affinity chromatography. An unambiguous sequence obtained from the purified protein, 
VPAEGVSSLNEMINP, was used to construct a degenerate oligonucleotide probe. The full 
length cDNA was cloned using a PCR based strategy and a B6SUtA, cDNA library as more 
particularly described in the Example herein. The nucleic acid sequence of murine SHIP is 
shown in Figure 3 or in SEQ. I.D. NO. 1. The underlined ATG is the likely start site (starting at 
nucleic acid 139). However, the predicted protein sequence shown in Figure 2 (A) (SEQ.ID.NO. 
2) is from an in frame ATG starting slightly upstream at nucleotide 130. The nucleotides from 
approximately 151 to 444 code for the SH2 domain; the nucleotides from 1886 to 1934, and 2144 
to 2167 code for 5-phosphatase motifs; the nucleotides from 1783 to 2130 code for the 5-ptase 
domain; nucleotides 2866-2880 and 3175 to 3189 code for the PTB domain target sequences, 
INPNY and ENPLY; and, the nucleotides 3013 to 3580 code for the proline-rich domain. 

The nucleic acid sequence of human SHIP is shown in Figure 10 and and Figure 13 (or in 
SEQ.ID.NO. 7). The human SHIP gene was mapped to chromosome 2 at the junction between 
q36 and q37. The nucleotides from approximately 141 to 434 in Figure 10 (SEQ.ID.NO. 7) code 
for the SH2 domain; the nucleotides from 1876 to 1924 and 2134 to 2157 in Figure 10 code for 5- 
phosphatase motifs; the nucleotides from 1773 to 2120 in Figure 10 code for the 5-ptase domain; 
nucleotides 2856 to 2870 and 3177 to 3191 in Figure 10 code for the PTB domain target sequences, 
INPNY and ENPLY; and the nucleotides 3009 to 3564 in Figure 10 code for the proline-rich 
domain. Figure 13 shows a comparison of the nucleic acid sequences encoding human SHIP and 
murine SHIP. The nucleic acid sequences encoding human and murine SHIP are 81.6% identical. 

The invention includes nucleic acids having substantial homology or identity with the 
nucleic acid sequences encoding human and murine SHIP. Homology or identity refers to 
sequence similarity between the nucleic acid sequences and it may be determined by comparing 
a position in each sequence which is aligned for purposes of comparison. When a position in 
the compared sequence is occupied by the same nucleotide base, then the molecules are 
identical or homologous at that position. 

It will be appreciated that the invention includes nucleic acid molecules encoding 
truncations of SHIP, and analogs and homologs of SHIP and truncations thereof (i.e., SHIP 
related proteins), as described herein. It will further be appreciated that variant forms of the 
nucleic acid molecules of the invention which arise by alternative splicing of an mRNA 
corresponding to a cDNA of the invention are encompassed by the invention. 

Another aspect of the invention provides a nucleic acid molecule which hybridizes 
under high stringency conditions to a nucleic acid molecule which comprises a sequence which 
encodes SHIP having the amino acid sequence shown in Figure 2 (A) or SEQ ID NO:2, or Figure 
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11 or SEQ ID NO:8, or to a SHIP related protein, and preferably having the activity of SHIP. 
Appropriate stringency conditions which promote DNA hybridization are known to those 
skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, 
N.Y. (1989), 6.3.1-6.3.6. For example, 6.0 x sodium chloride/ sodium citrate (SSC) at about 
5 45*C, followed by a wash of 2.0 x SSC at 50*C may be employed. The stringency may be 
selected based on the conditions used in the wash step. By way of example, the salt 
concentration in the wash step can be selected from a high stringency of about 0.2 x SSC at 50*C. 
In addition, the temperature in the wash step can be at high stringency conditions, at about 
65'C 

10 Isolated and purified nucleic acid molecules encoding a protein having the activity of 

SHIP as described herein, and having a sequence which differs from the nucleic acid sequence 
shown in SEQ ID NO:l or Figure 3, or SEQ ID NO:7 or Figure 10, due to degeneracy in the 
genetic code are also within the scope of the invention. Such nucleic acids encode functionally 
equivalent proteins (e.g., a protein having SH2-containing inositol-phosphatase activity) but 

15 differ in sequence from the sequence of SEQ ID NO:l or Figure 3, or SEQ ID NO:7 or Figure 10, 
due to degeneracy in the genetic code. 

In addition, DNA sequence polymorphisms within the nucleotide sequence of SHIP 
(especially those within the third base of a codon) may result in "silent" mutations in the 
DNA which do not affect the amino acid encoded. However, DNA sequence polymorphisms 

20 may lead to changes in the amino acid sequences of SHIP within a population. It will be 
appreciated by one skilled in the art that these variations in one or more nucleotides (up to 
about 3-4% of the nucleotides) of the nucleic acids encoding proteins having the activity of 
SHIP may exist among individuals within a population due to natural allelic variation. Any 
and all such nucleotide variations and resulting amino acid polymorphisms are within the 

25 scope of the invention. 

An isolated and purified nucleic acid molecule of the invention which comprises DNA 
can be isolated by preparing a labelled nucleic acid probe based on all or part of the nucleic 
acid sequence shown in SEQ ID NO: 1 or Figure 3, (for example, nucleotides 2830 to 2874 
encoding VPAEGVSSLNEMINP; nucleotides encoding NEMINP or VPAEGV; or nucleotides 151 

30 to 444 encoding the SH2 domain), or based on all or part of the nucleic acid sequence shown in 
SEQ ID NO: 7 or Figure 10, and using this labelled nucleic acid probe to screen an appropriate 
DNA library (e.g. a cDNA or genomic DNA library). For instance, a cDNA library made from 
hemopoietic cells can be used to isolate a cDNA encoding a protein having SHIP activity by 
screening the library with the labelled probe using standard techniques. Alternatively, a 

35 genomic DNA library can be similarly screened to isolate a genomic clone encompassing a gene 
encoding a protein having SH2-containing inositol-phosphatase activity. Nucleic acids 
isolated by screening of a cDNA or genomic DNA library can be sequenced by standard 
techniques. 
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An isolated and purified nucleic acid molecule of the invention which is DNA can also 
be isolated by selectively amplifying a nucleic acid encoding SHIP using the polymerase chain 
reaction (PCR) methods and cDNA or genomic DNA. It is possible to design synthetic 
oligonucleotide primers from the nucleotide sequence shown in SEQ ID NO:l or Figure 3, or 
5 shown in SEQ ID NO:7 or Figure 10, for use in PCR. A nucleic acid can be amplified from cDNA 
or genomic DNA using these oligonucleotide primers and standard PCR amplification 
techniques. The nucleic acid so amplified can be cloned into an appropriate vector and 
characterized by DNA sequence analysis. It will be appreciated that cDNA may be prepared 
from mRNA, by isolating total cellular mRNA by a variety of techniques, for example, by 

10 using the guanidinium-thiocyanate extraction procedure of Chirgwin et al., Biochemistry, 18, 
5294-5299 (1979). cDNA is then synthesized from the mRNA using reverse transcriptase (for 
example, Moloney MLV reverse transcriptase available from Gibco/BRL, Bethesda, MD r or 
AMV reverse transcriptase available from Seikagaku America, Inc., St. Petersburg, FL). 

An isolated and purified nucleic acid molecule of the invention which is RNA can be 

15 isolated by cloning a cDNA encoding SHIP into an appropriate vector which allows for 
transcription of the cDNA to produce an RNA molecule which encodes a protein which 
exhibits phosphoIns-5-ptase activity. For example, a cDNA can be cloned downstream of a 
bacteriophage promoter, (e.g. a T7 promoter) in a vector, cDNA can be transcribed in vitro with 
T7 polymerase, and the resultant RNA can be isolated by standard techniques. 

20 A nucleic acid molecule of the invention may also be chemically synthesized using 

standard techniques. Various methods of chemically synthesizing polydeoxynucleotides are 
known, including solid-phase synthesis which, like peptide synthesis, has been fully 
automated in commercially available DNA synthesizers (See e.g., Itakura et al. U.S. Patent 
No. 4,598,049; Caruthers et al. U.S. Patent No. 4,458,066; and Itakura U.S. Patent Nos. 

25 4,401,796 and 4,373,071). 

Determination of whether a particular nucleic acid molecule encodes a protein having 
SHIP activity can be accomplished by expressing the cDNA in an appropriate host ceil by 
standard techniques, and testing the ability of the expressed protein to associate with She 
and /or hydrolyze a substrate as described herein. A cDNA having the biological activity of 

30 SHIP so isolated can be sequenced by standard techniques, such as dideoxynucleotide chain 
termination or Maxam-Gilbert chemical sequencing, to determine the nucleic acid sequence and 
the predicted amino acid sequence of the encoded protein. 

The initiation codon and untranslated sequences of SHIP or a SHIP related protein 
may be determined using currently available computer software designed for the purpose, such 

35 as PC/Gene (IntelliGenetics Inc., Calif.). The intron-exon structure and the transcription 
regulatory sequences of the gene encoding the SHIP protein may be identified by using a nucleic 
acid molecule of the invention encoding SHIP to probe a genomic DNA clone library. 
Regulatory elements can be identified using conventional techniques. The function of the 
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elements can be confirmed by using these elements to express a reporter gene such as the 
bacterial gene lacZ which is operatively linked to the elements. These constructs may be 
introduced into cultured cells using standard procedures or into non-human transgenic animal 
models. In addition to identifying regulatory elements in DNA, such constructs mav also be 
5 used to identify nuclear proteins interacting with the elements, using techniques known in the 
art. 

The 5' untranslated region of murine SHIP comprises nucleotides 1 to 138 in Figure 2(A) 
or SEQ ID. NO. 1, and the 5' untranslated region of human SHIP comprises nucleotides 1 to 128 
in Figure 10 or SEQ ID. NO. 7. 

10 The sequence of a nucleic acid molecule of the invention may be inverted relative to its 

normal presentation for transcription to produce an antisense nucleic acid molecule. An 
antisense nucleic acid molecule may be constructed using chemical synthesis and enzymatic 
ligation reactions using procedures known in the art. 
II. SHIP Proteins of the Invention 

15 The amino acid sequence of murine SHIP is shown in SEQ.ID.No.2 or in Figure 2 (A) 

and the amino acid sequence of human SHIP is shown in SEQ.ID.No. 8 or in Figure 11. SHIP 
contains a number of well-characterized regions including an amino terminal src homology 2 
(SH2) domain containing the sequence DGSFLVR which is highly conserved among SH2 
domains; two phosphotyrosine binding (PTB) consensus sequences; proline rich regions near the 

20 carboxy terminus containing a class I sequence (PPSQPPLSP) and class II sequences (PVKPSR, 
PPLSPKK, AND PPLPVK); and two motifs highly conserved among inositol polyphosphate-5- 
phosphatases (i.e. the sequences WLGDLNYR and K YNLPSWCDR VLW ) . 

The SHIP protein is expressed in many cell types including hemopoietic cells, bone 
marrow, lung, spleen, muscles, testes, and kidney. 

25 In addition to the full length SHIP amino acid sequence (SEQ. ID.NO:2 or Figure 2(A); 

SEQ. ID.NO.8 or Figure 11), the proteins of the present invention include truncations of SHIP, 
and analogs, and homologs of SHIP and truncations thereof as described herein. Truncated 
proteins may comprise peptides of between 3 and 1090 amino acid residues, ranging in size from 
a tripeptide to a 1090 mer polypeptide. For example, a truncated protein may comprise the 

30 SH2 domain (the amino acids encoded by nucleotides 151 to 444 as shown in Figure 3 and 
encoded by nucleotides 141 to 434 in Figure 10); the proline rich regions (the amino acids 
encoded by nucleotides 3013 to 3580 in Figure 3 and encoded by nucleotides 3009 to 3564 in Figure 
10); the 5-phosphatase motifs (amino acids encoded by nucleotides 1886 to 1934 and 2144 to 
2167 in Figure 3 and encoded by nucleotides 1876 to 1924 and 2134 to 2157 in Figure 10); the 5- 

35 ptase domain (the amino acids encoded by nucleotides 1783 to 2130 in Figure 3 and encoded by 
nucleotides 1773 to 2120 in Figure 10); the PTB domain target sequences, DMPNY and ENPLY 
(the amino acids encoded by nucleotides 2866-2880 and 3175 to 3189 in Figure 3 and encoded by 
nucleotides 2856 to 2870 and 3177 to 3191 in Figure 10)); or NPXY sequence of SHIP. 
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The truncated proteins may have an amino group (-NH2), a hydrophobic group (for 
example, carbobenzoxyl, dansyl, or T-butyloxycarbonyl), an acetyl group, a 9- 
fluorenylmethoxy-carbonyl (PMOC) group, or. a macromolecule including but not limited to 
lipid-fatty acid conjugates, polyethylene glycol, or carbohydrates at the amino terminal end. 
5 The truncated proteins may have a carboxyl group, an amido group, a T-butyloxycarbonyl 
group, or a macromolecule including but not limited to lipid-fatty acid conjugates, 
polyethylene glycol or carbohydrates at the carboxy terminal end. An isoprenoid may also be 
attached to a truncated protein comprising the 5-ptase domain to localize SHIP 5-ptase to the 
inside of the plasma membrane. 

10 The proteins of the invention may also include analogs of SHIP as shown in SEQ. ID. 

NO. 2 or Figure 2 (A), or as shown in SEQ. ID. NO. 8 or Figure 11, and /or truncations thereof as 
described herein, which may include, but are not limited to, SHIP (SEQ. ID, NO. 2 or Figure 
2(A); SEQ. ID. NO. 8 or Figure 11), containing one or more amino acid substitutions, insertions, 
and/ or deletions. Amino acid substitutions may be of a conserved or non-conserved nature. 

15 Conserved amino acid substitutions involve replacing one or more amino acids of the SHIP 
amino acid sequence with amino acids of similar charge, size, and /or hydrophobicity 
characterisitics. When only conserved substitutions are made the resulting analog should be 
functionally equivalent to SHIP (SEQ. ID. NO. 2 or Figure 2(A); SEQ. ID. NO. 8 or Figure 11). 
Non-conserved substitutions involve replacing one or more amino acids of the SHIP amino acid 

20 sequence with one or more amino acids which possess dissimilar charge, size, and /or 
hydrophobicity characteristics. By way of example, D675 may be replaced with A675 in 
Figure 2(A) (or 672 in Figure 11) to create an analog which does not have 5-ptase activity. 

One or more amino acid insertions may be introduced into SHIP (SEQ. ID. NO. 2 or 
Figure 2(A); SEQ. ID. NO. 8 or Figure 11). Amino acid insertions may consist of single amino 

25 acid residues or sequential amino acids ranging from 2 to 15 amino acids in length. For example, 
amino acid insertions may be used to destroy the PTB domain target sequences or the proline- 
rich consensus sequences so that SHIP can no longer bind SH3-containing proteins. 

Deletions may consist of the removal of one or more amino acids, or discrete portions 
(e.g. one or more of the SH2 domain, PTB consensus sequences; the sequences conserved among 

30 inositol polyphosphate-5-phosphatases) from the SHIP (SEQ. ID. NO. 2 or Figure 2(A), SEQ. 
ID. NO. 8 or Figure 11) sequence. The deleted amino acids may or may not be contiguous. The 
lower limit length of the resulting analog with a deletion mutation is about 10 amino acids, 
preferably 100 amino acids. 

It is anticipated that if amino acids are replaced, inserted or deleted in sequences 

35 outside the amino terminal src homology 2 (SH2) domain, the phosphotyrosine binding (PTB) 
consensus sequences, the proline rich region and motifs highly conserved among inositol 
polyphosphate-5-phosphatases, that the resulting analog of SHIP will associate with She 
and exhibit phosphoIns-5-ptase activity. 
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The proteins of the invention also include homologs of SHIP (SEQ. ID. NO. 2 or Figure 
2(A); SEQ. ID. NO. 8 or Figure 11) and/or truncations thereof as described herein. Homology or 
identity refers to sequence similarity between sequences and it may be determined by comparing 
a position in each sequence which may be aligned for purposes of comparison. A degree of 
5 -homology between sequences is a function of the number of matching positions shared by the 
sequences. Homologs will generally have the same regions which are characteristic of SHIP, 
namely an amino terminal src homology 2 (SH2) domain, two phosphotyrosine binding (PTB) 
consensus sequences, a proline rich region and two motifs highly conserved among inositol 
polyphosphate-5-phosphatases. It is anticipated that, outside of the well-characterized 
10 regions of SHIP specified herein (i.e. SH2 domain, PTB domain etc), a protein comprising an 
amino acid sequence which is about 50% similar, preferably 80 to 90% similar, with the amino 
acid sequences shown in SEQ ID NO:2 or Figure 2(A), or SEQ. ID. NO. 8 or Figure 11, will 
exhibit phosphoIns-5-ptase activity and associate with She. 

A comparison of the amino acid sequences of murine and human SHIP are shown in 
15 Figure 12. As shown in Figure 12, human and murine SHIP are 87.2% identical at the amino 
acid level. 

The invention also contemplates isoforms of the protein of the invention. An isoform 
contains the same number and kinds of amino acids as the protein of the invention, but the 
isoform has a different molecular structure. The isoforms contemplated by the present 
20 invention are those having the same properties as the protein of the invention as described 
herein. 

The present invention also includes SHIP or a SHIP related protein conjugated with a 
selected protein, or a selectable marker protein (see below) to produce fusion proteins. Further, 
the present invention also includes activated or phosphorylated SHIP proteins of the 

25 invention. Additionally, immunogenic portions of SHIP and SHIP related proteins are within 
the scope of the invention. 

SHIP and SHIP related proteins of the invention may be prepared using recombinant 
DNA methods. Accordingly, the nucleic acid molecules of the present invention having a 
sequence which encodes SHIP or a SHIP related protein of the invention may be incorporated in 

30 a known manner into an appropriate expression vector which ensures good expression of the 
protein. Possible expression vectors include but are not limited to cosmids, plasmids, or 
modified viruses (e.g. replication defective retroviruses, adenoviruses and adeno-associated 
viruses), so long as the vector is compatible with the host cell used. The expression vectors are 
"suitable for transformation of a host cell", means that the expression vectors contain a nucleic 

35 acid molecule of the invention and regulatory sequences selected on the basis of the host cells to 
be used for expression, which is operatively linked to the nucleic acid molecule. Operatively 
linked is intended to mean that the nucleic acid is linked to regulatory sequences in a manner 
which allows expression of the nucleic acid. 
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The invention therefore contemplates a recombinant expression vector of the invention 
containing a nucleic acid molecule of the invention, or a fragment thereof, and the necessary 
regulatory sequences for the transcription and translation of the inserted protein sequence. 
Suitable regulatory sequences may be derived from a variety of sources, including bacterial, 
5 fungal, viral, mammalian, or insect genes (For example, see the regulatory sequences described 
in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San 
Diego, CA (1990). Selection of appropriate regulatory sequences is dependent on the host cell 
chosen as discussed below, and may be readily accomplished by one of ordinary skill in the art. 
Examples of such regulatory sequences include: a transcriptionaf promoter and enhancer or 

10 RNA polymerase binding sequence, a ribosomal binding sequence, including a translation 
initiation signal. Additionally, depending on the host cell chosen and the vector employed, 
other sequences, such as an origin of replication, additional DNA restriction sites, enhancers, 
and sequences conferring inducibility of transcription may be incorporated into the expression 
vector. It will also be appreciated that the necessary regulatory sequences may be supplied by 

15 the native SHIP and /or its flanking regions. 

The invention further provides a recombinant expression vector comprising a DNA 
nucleic acid molecule of the invention cloned into the expression vector in an antisense 
orientation. That is, the DNA molecule is operatively linked to a regulatory sequence in a 
manner which allows for expression, by transcription of the DNA molecule, or an RNA 

20 molecule which is antisense to the nucleotide sequence of SEQ ID NO: 1 or Figure 2(A), or SEQ. 
ID. NO. 8 or Figure 11. Regulatory sequences operatively linked to the antisense nucleic acid 
can be chosen which direct the continuous expression of the antisense RNA molecule in a 
variety of cell types, for instance a viral promoter and/ or enhancer, or regulatory sequences can 
be chosen which direct tissue or cell type specific expression of antisense RNA. 

25 The recombinant expression vectors of the invention may also contain a selectable 

marker gene which facilitates the selection of host cells transformed or transfected with a 
recombinant molecule of the invention. Examples of selectable marker genes are genes encoding 
a selectable marker protein such as G418 and hygromycin which confer resistance to certain 
drugs, (J-galactosidase, chloramphenicol acetyltransferase, firefly luciferase, or an 

30 immunoglobulin or portion thereof such as the Fc portion of an immunoglobulin preferably IgG. 
Transcription of the selectable marker gene is monitored by changes in the concentration of the 
selectable marker protein such as P-galactosidase, chloramphenicol acetyltransferase, or 
firefly luciferase. If the selectable marker gene encodes a protein conferring antibiotic 
resistance such as neomycin resistance transformant cells can be selected with G418. Cells that 

35 have incorporated the selectable marker gene will survive, while the other cells die. This 
makes it possible to visualize and assay for expression of recombinant expression vectors of the 
invention and in particular to determine the effect of a mutation on expression and phenotype. 
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It will be appreciated that selectable markers can be introduced on a separate vector from the 
nucleic acid of interest. 

The recombinant expression vectors may also contain genes which encode a fusion 
moiety which provides increased expression of the recombinant protein; increased solubility of 
5 the recombinant protein; and aid in the purification of the target recombinant protein by 
acting as a ligand in affinity purification. For example, a proteolytic cleavage site may be 
added to the target recombinant protein to allow separation of the recombinant protein from 
the fusion moiety subsequent to purification of the fusion protein. Typical fusion expression 
vectors include pGEX (Amrad Corp., Melbourne, Australia), pMAL (New England Biolabs, 

10 Beverly, MA) and pRIT5 (Pharmacia, Piscataway, NJ) which fuse glutathione S-tranferase 
(GST), maltose E binding protein, or protein A, respectively, to the recombinant protein. 

Recombinant expression vectors can be introduced into host cells to produce a 
transformant host cell. The term "transformant host cell" is intended to include prokarybtic 
and eukaryotic cells which have been transformed or transfected with a recombinant 

15 expression vector of the invention. The terms "transformed with", "transfected with", 
"transformation" and "transfection" are intended to encompass introduction of nucleic acid (e.g. 
a vector) into a cell by one of many possible techniques known in the art. Prokaryotic cells can 
be transformed with nucleic acid by, for example, electroporation or calcium-chloride 
mediated transformation. Nucleic acid can be introduced into mammalian cells via 

20 conventional techniques such as calcium phosphate or calcium chloride co-precipitation, 
DEAE-dextran-mediated transfection, lipofectin, electroporation or microinjection. Suitable 
methods for transforming and transfecting host cells can be found in Sambrook et al. (Molecular 
Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and 
other laboratory textbooks. 

25 Suitable host cells include a wide variety of prokaryotic and eukaryotic host cells. 

For example, the proteins of the invention may be expressed in bacterial cells such as £. coli, 
insect cells (using baculovirus), yeast cells or mammalian cells. Other suitable host cells can be 
found in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, 
San Diego, CA (199 1). 

30 More particularly, bacterial host cells suitable for carrying out the present invention 

include £. coli, B. subtilis, Salmonella typhimurium, and various species within the genus' 
Pseudomonas, Streptomyces, and Staphylococcus, as well as many other bacterial species well 
known to one of ordinary skill in the art. Suitable bacterial expression vectors preferably 
comprise a promoter which functions in the host cell, one or more selectable phenotypic 

35 markers, and a bacterial origin of replication. Representative promoters include the 
P-lactamase (penicillinase) and lactose promoter system (see Chang et al., Nature 275:615, 
1978), the trp promoter (Nichols and Yanofsky, Meth in Enzymology 101:155, 1983) and the tac 
promoter (Russell et al., Gene 20: 231, 1982). Representative selectable markers include various 
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antibiotic resistance markers such as the kanamycin or ampicillin resistance genes. Suitable 
expression vectors include but are not limited to bacteriophages such as lambda derivatives or 
plasmids such as pBR322 (see Bolivar et al.. Gene 2:9S, 1977), the pUC plasmids pUC18, 
pUC19, pUC118, pUC119 (see Messing, Meth in Enzymology 101:20-77, 1983 and Vieira and 
Messing, Gene 19:259-268, 1982), and pNH8A, pNH16a, pNH18a, and Bluescript M13 
(Stratagene, La Jolla, Calif.)- Typical fusion expression vectors which may be used are 
discussed above, e.g. pGEX (Amrad Corp., Melbourne, Australia), pMAL (New England 
Biolabs, Beverly, MA) and pRTTS (Pharmacia, Piscataway, NJ). Examples of inducible non- 
fusion expression vectors include pTrc (Amann et al, (1988) Gene 69:301-315) and pET lid 
(Studier et al., Gene Expression Technology: Methods in Enzymology 185, Academic Press, San 
Diego, California (1990) 60-89). 

Yeast and fungi host cells suitable for carrying out the present invention include, but 
are not limited to Saccharomyces cerevisae, the genera Pichia or Kluyveromyces and various 
species of the genus Aspergillus. Examples of vectors for expression in yeast S. cerivisae 
include pYepSecl (Baldari. et al., (1987) Embo J. 6:229-234), pMFa (Kuijan and Herskowitz, 
(1982) Cell 30:933-943), pJRY88 (Schultz et al., (1987) Gene 54:113-123), and pYES2 (Invitrogen 
Corporation, San Diego, CA). Protocols for the transformation of yeast and fungi are well 
known to those of ordinary skill in the art.(see Hinnen et al., PNAS USA 75:1929, 1978; Itoh et 
al, J. Bacteriology 153:163, 1983, and Cullen et al. (Bio/Technology 5:369, 1987). 

Mammalian cells suitable for carrying out the present invention include, among others: 
COS (e.g., ATCC No. CRL 1650 or 1651), BHK (e.g., ATCC No. CRL 6281), CHO (ATCC No. 
CCL 61), HeLa (e.g., ATCC No. CCL 2), 293 (ATCC No. 1573) and NS-1 cells. Suitable 
expression vectors for directing expression in mammalian cells generally include a promoter 
(e.g., derived from viral material such as polyoma, Adenovirus 2, cytomegalovirus and Simian 
Virus 40), as well as other transcriptional and translational control sequences. Examples of 
mammalian expression vectors include pCDM8 (Seed, B., (1987) Nature 329:840) and pMT2PC 
(Kaufman et al. (1987), EMBOJ. 6:187-195). 

Given the teachings provided herein, promoters, terminators, and methods for 
introducing expression vectors of an appropriate type into plant, avian, and insect cells may 
also be readily accomplished. For example, within one embodiment, the proteins of the 
invention may be expressed from plant cells (see Sinkar et al., J. Biosci (Bangalore) 11:47-58, 
1987, which reviews the use of Agrobacterium rhizogenes vectors; see also Zambryski et al., 
Genetic Engineering, Principles and Methods, Hollaender and Setlow (eds.), Vol. VI, pp. 
253-278, Plenum Press, New York, 1984, which describes the use of expression vectors for plant 
cells, including, among others, pAS2022, pAS2023, and pAS2034). 

Insect cells suitable for carrying out the present invention include cells and cell lines 
from Bombyx or Spodotera species. Baculo virus vectors available for expression of proteins in 
cultured insect cells (SF 9 cells) include the pAc series (Smith et al., (1983) Mol. Cell Biol. 



WO 97/12039 PCT/CA96/00655 

-17- 

3:2156-2165) and the pVL series (Lucklow, V.A., and Summers, M.D., (1989) Virology 170:31- 
39). 

Alternatively, the proteins of the invention may also be expressed in non-human 
transgenic animals such as, rats, rabbits, sheep and pigs (see Hammer et al. (Nature 
5 315:680-683, 1985), Palmiter et al (Science 222:809-814, 1983), Brinster et al. (Proc Natl. Acad. 
Sci USA 82:44384442, 1985), Palmiter and Brinster (Cell. 41:343-345, 1985) and U.S. Patent No. 
4,736,866). 

The proteins of the invention may also be prepared by chemical synthesis using 
techniques well known in the chemistry of proteins such as solid phase synthesis (Merrifield, 

10 1964, J. Am. Chem. Assoc. 85:2149-2154) or synthesis in homogenous solution (Houbenweyl, 
1987, Methods of Organic Chemistry, ed. E. Wansch, Vol. 15 1 and II, Thieme, Stuttgart). 

N-terminal or C-terminal fusion proteins comprising SHIP or a SHIP related protein of 
the invention conjugated with other molecules, such as proteins may be prepared by fusing, 
through recombinant techniques, the N-terminal or C-terminal of SHIP or a SHIP related 

15 protein, and the sequence of a selected protein or selectable marker protein with a desired 
biological function. The resultant fusion proteins contain SHIP or a SHIP related protein fused 
to the selected protein or marker protein as described herein. Examples of proteins which may 
be used to prepare fusion proteins include immunoglobulins, glutathione-S-transferase (GST), 
hemagglutinin (HA), and truncated myc. The present inventor has made GST fusion proteins 

20 containing the SH2 domain of SHIP and GST fusion proteins containing the 5-ptase domain 
attached to an isoprenoid to localize SHIP 5-ptase to the inside of the plasma membrane. 

Phosphorylated or activated SHIP or SHIP related proteins of the invention may be 
prepared using the method described in Reedijk et al. The EMBO Journal 11(4):1365, 1992. For 
example, tyrosine phosphorylation may be induced by infecting bacteria harbouring a plasmid 

25 containing a nucleotide sequence of the invention, with a kgtll bacteriophage encoding the 
cytoplasmic domain of the Elk tyrosine kinase as an Elk fusion protein. Bacteria containing 
the plasmid and bacteriophage as a lysogen are isolated. Following induction of the lysogen, 
the expressed protein becomes phosphorylated by the tyrosine kinase. 
UL Utility of the Nucleic Acid Molecules and Proteins of the Invention 

30 The nucleic acid molecules of the invention allow those skilled in the art to construct 

nucleotide probes for use in the detection of nucleic acid sequences in biological materials. 
Suitable probes include nucleic acid molecules based on nucleic acid sequences encoding at least 
6 sequential amino acids from regions of the SHIP protein as shown in SEQ.ID NO:2 or Figure 2 
(A), and SEQ.ID NO:8 or Figure 11. For example, a probe may be based on the nucleotides 2830 

35 to 2874 in Figure 3 (or SEQ ID.NO. 1) encoding VPAEGVSSLNEMINP; the nucleotides encoding 
NEMINP or VPAEGV; or the nucleotides 151 to 445 in Figure 3 (or SEQ ID.NO. 1) encoding the 
SH2 domain. Preferably, the probe comprises a 1 to 1.5kb segment corresponding to the 5' and 
3' ends of the 5Kb SHIP mRNA. A nucleotide probe may be labelled with a detectable 
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substance such as a radioactive label which provides for an adequate signal and has sufficient 
half-life such as 32 P , 3 H , 14 C or the like. Other detectable substances which may be used 
include antigens that are recognized by a specific labelled antibody, fluorescent compounds, 
enzymes, antibodies specific for a labelled antigen, and luminescent compounds. An 
appropriate label may be selected having regard to the rate of hybridization and binding of 
the probe to the nucleotide to be detected and the amount of nucleotide available for 
hybridization. Labelled probes may be hybridized to nucleic acids on solid supports such as 
nitrocellulose filters or nylon membranes as generally described in Sambrook et al, 1989 
Molecular Cloning, A Laboratory Manual (2nd ed.). The nucleic acid probes may be used to 
detect genes, preferably in human cells, that encode SHIP, and SHIP related proteins. The 
nucleotide probes may therefore be useful in the diagnosis of disorders of the hemopoietic 
system including chronic myelogenous leukemia, and acute lymphocytic leukemia, etc. 

SHIP or a SHIP related protein of the invention can be used to prepare antibodies 
specific for the proteins. Antibodies can be prepared which bind a distinct epitope in an 
15 unconsented region of the protein. An unconsented region of the protein is one which does not 
have substantial sequence homology to other proteins, for example the regions outside the 
well-characterized regions of SHIP as described herein. Alternatively, a region from one of 
the well-characterized domains (e.g. SH2 domain) can be used to prepare an antibody to a 
conserved region of SHIP or a SHIP related protein. Antibodies having specificity for SHIP or 
a SHIP related protein may also be raised from fusion proteins created by expressing for 
example, trpE-SHIP fusion proteins in bacteria as described herein. 

Conventional methods can be used to prepare the antibodies. For example, by using a 
peptide of SHIP or a SHIP related protein, polyclonal antisera or monoclonal antibodies can be 
made using standard methods. A mammal, (e.g., a mouse, hamster, or rabbit) can be immunized 
with an immunogenic form of the peptide which elicits an antibody response in the mammal. 
Techniques for conferring immunogenicity on a peptide include conjugation to carriers or other 
techniques well known in the art. For example, the peptide can be administered in the 
presence of adjuvant. The progress of immunization can be monitored by detection of antibody 
titers in plasma or serum. Standard ELISA or other immunoassay procedures can be used with 
the immunogen as antigen to assess the levels of antibodies. Following immunization, antisera 
can be obtained and, if desired, polyclonal antibodies isolated from the sera. 

To produce monoclonal antibodies, antibody producing cells (lymphocytes) can be 
harvested from an immunized animal and fused with myeloma cells by standard somatic cell 
fusion procedures thus immortalizing these cells and yielding hybridoma cells. Such 
techniques are well known in the art, (e.g., the hybridoma technique originally developed by 
Kohler and Milstein (Nature 256, 495-497 (1975)) as well as other techniques such as the 
human B-cell hybridoma technique (Kozbor et al., Immunol. Today 4, 72 (1983)), the EBV- 
hybridoma technique to produce human monoclonal antibodies (Cole et al. Monoclonal 



20 



25 



30 



35 



WO 97/12039 PCT/CA96/00655 

- 19 - 

Antibodies in Cancer Therapy (1985) Allen R. Bliss, Inc., pages 77-96), and screening of 
combinatorial antibody libraries (Huse et al., Science 246, 1275 (1989)]. Hybridoma cells can be 
screened immunochemical^ for production of antibodies specifically reactive with the 
peptide and the monoclonal antibodies can be isolated. Therefore, the invention also 
5 contemplates hybridoma cells secreting monoclonal antibodies with specificity for SHIP or a 
SHIP related protein as described herein. 

The term "antibody" as used herein is intended to include fragments thereof which 
also specifically react with a protein, or peptide thereof, having the activity of SHIP. 
Antibodies can be fragmented using conventional techniques and the fragments screened for 

10 utility in the same manner as described above. For example, F(ab')2 fragments can be 
generated by treating antibody with pepsin. The resulting F(ab*)2 fragment can be treated to 
reduce disulfide bridges to produce Fab' fragments. 

Chimeric antibody derivatives, i.e., antibody molecules that combine a non-human 
animal variable region and a human constant region are also contemplated within the scope of 

15 the invention. Chimeric antibody molecules can include, for example, the antigen binding 
domain from an antibody of a mouse, rat, or other species, with human constant regions. 
Conventional methods may be used to make chimeric antibodies containing the immunoglobulin 
variable region which recognizes the gene product of SHIP antigens of the invention (See, for 
example, Morrison et al., Proc. Natl Acad. Sci. U.S.A. 81,6851 (1985); Takeda et aL, Nature 

20 314, 452 (1985), Cabilly et ah, U.S. Patent No. 4,816,567; Boss et al„ U.S. Patent No. 
4,816,397; Tanaguchi et al., European Patent Publication EP171496; European Patent 
Publication 0173494, United Kingdom patent GB 2177096B). It is expected that chimeric 
antibodies would be less immunogenic in a human subject than the corresponding non-chimeric 
antibody. 

25 Monoclonal or chimeric antibodies specifically reactive with a protein of the 

invention as described herein can be further humanized by producing human constant region 
chimeras, in which parts of the variable regions, particularly the conserved framework 
regions of the antigen-binding domain, are of human origin and only the hypervariable regions 
are of non-human origin. Such immunoglobulin molecules may be made by techniques known in 

30 the art, (e.g., Teng et aL, Proc. Natl. Acad. Sci. U.S.A., 80, 7308-7312 (1983); Kozbor et al., 
Immunology Today, 4, 7279 (1983); Olsson et al., Meth. Enzymol., 92, 3-16 (1982)), and PCT 
Publication WO92/06193 or EP 0239400). Humanized antibodies can also be commercially 
produced (Scotgen Limited, 2 Holly Road, Twickenham, Middlesex, Great Britain.) 

Specific antibodies, or antibody fragments, reactive against proteins of the invention 

35 may also be generated by screening expression libraries encoding immunoglobulin genes, or 
portions thereof, expressed in bacteria with peptides produced from the nucleic acid molecules 
of the present invention. For example, complete Fab fragments, VH regions and FV regions can 
be expressed in bacteria using phage expression libraries (See for example Ward et al., Nature 
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341, 544-546: (1989); Huse et al, Science 246, 1275-1281 (1989); and McCafferty et al. Nature 
348, 552-554 (1990)). Alternatively, a SCID-hu mouse, for example the model developed by 
Genpharm, can be used to produce antibodies, or fragments thereof. 

Antibodies specifically reactive with SHIP or a SHIP related protein, or derivatives 
thereof, such as enzyme conjugates or labeled derivatives, may be used to detect SHIP in 
various biological materials, for example they may be used in any known immunoassays which 
rely on the binding interaction between an antigenic determinant of SHIP or a SHIP related 
protein, and the antibodies. Examples of such assays are radioimmunoassays, enzyme 
immunoassays (e.g.ELISA), immunofluorescence, immunoprecipitation, latex agglutination, 
hemagglutination, and histochemical tests. Thus, the antibodies may be used to detect and 
quantify SHIP in a sample in order to determine its role in particular cellular events or 
pathological states, and to diagnose and treat such pathological states. 

In particular, the antibodies of the invention may be used in immuno-histochemical 
analyses, for example, at the cellular and sub-subcellular level, to detect SHIP, to localise it to 
15 particular cells and tissues and to specific subcellular locations, and to quantitate the level of 
expression. 

Cytochemical techniques known in the art for localizing antigens using light and 
electron microscopy may be used to detect SHIP. Generally, an antibody of the invention may 
be labelled with a detectable substance and SHIP may be localised in tissue based upon the 

20 presence of the detectable substance. Examples of detectable substances include various 
enzymes, fluorescent materials, luminescent materials and radioactive materials. Examples of 
suitable enzymes include horseradish peroxidase, biotin, alkaline phosphatase, 
(J-galactosidase, or acetylcholinesterase; examples of suitable fluorescent materials include 
umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine 

25 fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes 
luminol; and examples of suitable radioactive material include radioactive iodine 1*25, ji3i or 
tritium. Antibodies may also be coupled to electron dense substances, such as ferritin or 
colloidal gold, which are readily visualised by electron microscopy. 

Indirect methods may also be employed in which the primary antigen-antibody 

30 reaction is amplified by the introduction of a second antibody, having specificity for the 
antibody reactive against SHIP. By way of example, if the antibody having specificity 
against SHIP is a rabbit IgG antibody, the second antibody may be goat anti-rabbit 
gamma-globulin labelled with a detectable substance as described herein. 

Where a radioactive label is used as a detectable substance, SHIP may be localized by 

35 radioautography. The results of radioautography may be quantitated by determining the 
density of particles in the radioautographs by various optical methods, or by counting the 
grains. 
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As discussed herein, SHIP associates with She following cytokine stimulation of 
hemopoietic cells, and it has a role in regulating proliferation, differentiation, activation and 
metabolism of cells of the hemopoietic system. Therefore, the above described methods for 
detecting nucleic acid molecules of the invention and SHIP, can be used to monitor 
proliferation, differentiation, activation and metabolism of cells of the hemopoietic system by 
detecting and localizing SHIP and nucleic acid molecules encoding SHIP. It would also be 
apparent to one skilled in the art that the above described methods may be used to study the 
developmental expression of SHIP and, accordingly, will provide further insight into the role 
of SHIP in the hemopoietic system. 

SHIP has unique and important roles in the regulation of signalling pathways that 
control gene expression, cell proliferation, differentiation, activation, and metabolism. This 
finding permits the identification of substances which affect SHIP regulatory systems and 
which may be used in the treatment of conditions involving perturbation of signalling 
pathways. The term "SHIP regulatory system" refers to the interaction of SHIP or a SHIP 
related protein and She or a part thereof, to form a SHIP-Shc complex thereby activating a 
series of regulatory pathways that control gene expression, cell division, cytoskeletal 
architecture and cell metabolism. Such pathways include the Ras pathway, the pathway 
that regulates the breakdown of polyphosphoinositides through phospholipase C, and PI-3- 
kinase activated pathways, such as the emerging rapamycin-sensitive protein kinase B 
(PKB/Akt) pathway. 

A substance which affects SHIP and accordingly a SHIP regulatory system may be 
assayed using the above described methods for detecting nucleic acid molecules and SHIP and 
SHIP related proteins, and by comparing the pattern and level of expression of SHIP or SHIP 
related proteins in the presence and absence of the substance. 

Substances which affect SHIP can also be identified based on their ability to bind to 
SHIP or a SHIP related protein. Therefore, the invention also provides methods for 
identifying substances which are capable of binding to SHIP or a SHIP related protein. In 
particular, the methods may be used to identify substances which are capable of binding to, 
and in some cases activating (i.e., phosphorylating) SHIP or a SHIP related protein of the 
invention. 

Substances which can bind with SHIP or a SHIP related protein of the invention may 
be identified by reacting SHIP or a SHIP related protein with a substance which potentially 
binds to SHIP or a SHIP related protein, under conditions which permit the formation of 
substance -SHIP or -SHIP related protein complexes and assaying for complexes, for free 
substance, or for non-complexed SHIP or SHIP related protein, or for activation of SHIP or 
SHIP related protein. Conditions which permit the formation of substance SHIP or SHIP 
related protein complexes may be selected having regard to factors such as the nature and 
amounts of the substance and the protein. 



WO 97/12039 



-22- 



PCT/CA96/006S5 



The substance-protein complex, free substance or non-complexed proteins may be 
isolated by conventional isolation techniques; for example, salting out, chromatography, 
electrophoresis, gel filtration, fractionation, absorption, polyacrylamide gel electrophoresis, 
agglutination, or combinations thereof. To facilitate the assay of the components, antibody 
5 against SHIP or SHIP related protein or the substance, or labelled SHIP or SHIP related 
protein, or a labelled substance may be utilized. The antibodies, proteins, or substances may be 
labelled with a detectable substance as described above. 

Substances which bind to and activate SHIP or a SHIP related protein of the invention 
may be identified by assaying for phosphorylation of the tyrosine residues of the protein, for 
10 example using antiphosphotyrosine antibodies and labelled phosphorus. 

SHIP or SHIP related protein, or the substance used in the method of the invention 
may be insolubilized. For example, SHIP or SHIP related protein or substance may be bound to 
a suitable carrier. Examples of suitable carriers are agarose, cellulose, dextran, Sephadex, 
Sepharose, carboxymethyl cellulose polystyrene, filter paper, ion-exchange resin, plastic 
15 film, plastic tube, glass beads, polyamine-methyl vinyl-ether-maleic acid copolymer, amino 
acid copolymer, ethylene-maleic acid copolymer, nylon, silk, etc. The carrier may be in the 
shape of, for example, a tube, test plate, beads, disc, sphere etc. 

The insolubilized protein or substance may be prepared by reacting the material with 
a suitable insoluble carrier using known chemical or physical methods, for example, cyanogen 
20 bromide coupling. 

The proteins or substance may also be expressed on the surface of a cell using the 
methods described herein. 

The invention also contemplates a method for assaying for an agonist or antagonist of 
the binding of SHIP or a SHIP related protein with a substance which is capable of binding 
with SHIP or a SHIP related protein. The agonist or antagonist may be an endogenous 
physiological substance or it may be a natural or synthetic substance. Substances which are 
capable of binding with SHIP or a SHIP related protein may be identified using the methods 
set forth herein. In a preferred embodiment, the substance is She, or a part of She, in particular 
the SH2 domain of She, PTB recognition sequences of She, or the region containing Y317 of She 
(i.e. amino acids 310 to 322) or an activated form thereof. The nucleic acid sequence and the 
amino acid sequence of She are shown in Figures 7 & 8 (SEQ ID. Nos. 3 and 4), respectively. 
She, or a part of She, may be prepared using conventional methods, or they may be prepared as 
fusion proteins (See Lioubin, M.N. Et al., Mol. Cell. Biol. 14(9):5682, 1994, and Kavanaugh, 
W. M., and L.T. WUliams, Science 266:1862, 1994 for methods for making She and She fusion 
35 proteins). She, or part of She may be activated i.e. phosphorylated using the methods 
described for example by Reedijk et al. (The EMBO Journal, U(4):1365, 1992) for producing a 
tyrosine phosphorylated protein. The substance may also be an SH3 containing protein such as 
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Grb2, or a part of Grb2, in particular the SH3 domain of Grb2. The nucleic acid sequence and the 
amino acid sequence of Grb2 are shown in Figure 9 (SEQ. ID. 5 and NO. 6, respectively). 

Therefore, in accordance with a preferred embodiment, a method is provided which 
comprises providing a known concentration of SHIP or a SHIP related protein, incubating SHIP 

5 or the SHIP related protein with She, or a part of She, and a suspected agonist or antagonist 
under conditions which permit the formation of Shc-SHIP or Shc-SHIP related protein 
complexes, and assaying for Shc-SHIP or Shc-SHIP related protein complexes, for free She, for 
non-complexed SHIP or SHIP related proteins, or for activation of SHIP or SHIP related 
proteins. Conditions which permit the formation of Shc-SHIP or Shc-SHIP related protein 

10 complexes and methods for assaying for Shc-SHIP or Shc-SHIP related protein complexes, for 
free She, for non-complexed SHIP or SHIP related protein, or for activation of SHIP or SHIP 
related protein are described herein. 

It will be understood that the agonists and antagonists that can be assayed using the 
methods of the invention may act on one or more of the binding sites on the protein or substance 

15 including agonist binding sites, competitive antagonist binding sites, non-competitive 
antagonist binding sites or allosteric sites. 

The invention also makes it possible to screen for antagonists that inhibit the effects 
of an agonist of the interaction of SHIP or a SHIP related protein with a substance which is 
capable of binding to SHIP or a SHIP related protein. Thus, the invention may be used to assay 

20 for a substance that competes for the same binding site of SHIP or a SHIP related protein. 

The methods described above may be used to identifying a substance which is capable 
of binding to an activated SHIP or SHIP related protein, and to assay for an agonist or 
antagonist of the binding of activated SHIP or SHIP related protein, with a substance which is 
capable of binding with activated SHIP or activated SHIP related protein. An activated (i.e. 

25 phosphorylated) SHIP or SHIP related protein may be prepared using the methods described 
for example in Reedijk et al. The EMBO Journal, 11(4):1365, 1992 for producing a tyrosine 
phosphorylated protein. 

It will also be appreciated that intracellular substances which are capable of binding 
to SHIP or a SHIP related protein may be identified using the methods described herein. For 

30 example, tyrosine phosphorylated proteins (such as the 97 kd and 75 kd proteins) and non- 
tyrosine phosphorylated proteins which bind to SHIP or a SHIP related protein may be 
isolated using the method of the invention, cloned, and sequenced. 

The invention also contemplates a method for assaying for the affect of a substance on 
the phosphoIns-5-ptase activity of SHIP or a SHIP related protein having phospholns-5- 

35 ptase activity comprising reacting a substrate which is capable of being hydrolyzed by SHIP 
or SHIP related protein to produce a hydrolysis product, with a substance which is suspected of 
affecting the phosphoIns-5-ptase activity of SHIP or a SHIP related protein, under conditions 
which permit the hydrolysis of the substrate, determining the amount of hydrolysis product, 
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and comparing the amount of hydrolysis product obtained with the amount obtained in the 
absence of the substance to determine the affect of the substance on the phosphoIns-5-ptase 
activity of SHIP or SHIP related proteins. Suitable substrates include phosphatidylinositol 
trisphosphate (PtdIns-3,4,5-P3) and inositol tetraphosphate (lns-l,3,4,5-P 4 ). The former 
5 substrate is hydroylzed to PtdIns-3,4-P 2 which may be identified by incubation with 
phosphoIns-4-ptase which converts the bis phosphate product to PtdIns-3-P. The latter is 
hydrolyzed to Ins-1,3,4-P3 which is identified by treatment with phospholns-l-ptase and 
phosphoIns-4-ptase. Conditions which permit the hydrolysis of the substrate, may be 
selected having regard to factors such as the nature and amounts of the substance, substrate, 

10 and the amount of SHIP or SHIP related proteins. 

The invention further provides a method for assaying for a substance that affects a 
SHIP regulatory pathway comprising administering to a non-human animal or to a tissue of an 
animal, a substance suspected of affecting a SHIP regulatory pathway, and quantitating SHIP 
or nucleic acids encoding SHIP, or examining the pattern and /or level of expression of SHIP, in 

15 the non-human animal or tissue. SHIP may be quantitated and its expression may be examined 
using the methods described herein. 

The substances identified by the methods described herein, may be used for 
modulating SHIP regulatory pathways and accordingly may be used in the treatment of 
conditions involving perturbation of SHIP signalling pathways. In particular, the substances 

20 may be particularly useful in the treatment of disorders of the hemopoietic system such as 
chronic myelogenous leukemia, and acute lymphocytic leukemia. 

SHIP is believed to enhance proliferation. Therefore, inhibitors of SHIP (e.g. 
truncated or point mutants or anti-sense) may be useful in reversing disorders involving 
excessive proliferation, and stimulators of SHIP may be useful in the treatment of disorders 

25 requiring stimulation of proliferation. Accordingly, the substances identified using the 
methods of the invention may be used to stimulate or inhibit cell proliferation associated with 
disorders including various forms of cancer such as leukemias, lymphomas (Hodgkins and 
non-Hodgkins), sarcomas, melanomas, adenomas, carcinomas of solid tissue, hypoxic tumors, 
squamous cell carcinomas of the mouth, throat, larynx, and lung, genitourinary cancers such as 

30 cervical and bladder cancer, hematopoietic cancers, head and neck cancers, and nervous system 
cancers, benign lesions such as papillomas, arthrosclerosis, angiogenesis, and viral infections, 
in particular HIV infections; and autoimmune diseases including systemic lupus erythematosus, 
Wegener's granulomatosis, rheumatoid arthritis, sarcoidosis, polyarthritis, pemphigus, 
pemphigoid, erythema multiforme, Sjogren's syndrome, inflammatory bowel disease, multiple 

35 sclerosis, myasthenia gravis, keratitis, scleritis, Type I diabetes, insulin-dependent diabetes 
mellitus, Lupus Nephritis, allergic encephalomyelitis. Substances which stimulate cell 
proliferation identified using the methods of the invention may be useful in the treatment of 
conditions involving damaged cells including conditions in which degeneration of tissue occurs 
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such as arthropathy, bone resorption, inflammatory disease, degenerative disorders of the 
central nervous system; and for promoting wound healing. The SH2 domain of SHIP has been 
found to be important for tyrosine phosphorylation, binding to She, and for translocation to 
membranes. The SH2 domain has also been shown to be important in the viability of various 
5 haemopoietic cells. Therefore, substances which enhance or inhibit SHIP may affect viability 
of haemopoietic cells, and they may be useful in preventing or treating conditions requiring 
enhancement or inhibition of viabUity of haemopoietic cells. 

The substances may be formulated into pharmaceutical compositions for adminstration 
to subjects in a biologically compatible form suitable for administration in vivo. By 
10 "biologically compatible form suitable for administration in vivo" is meant a form of the 
substance to be administered in which any toxic effects are outweighed by the therapeutic 
effects. The substances may be administered to living organisms including humans, and 
animals. Administration of a therapeutically active amount of the pharmaceutical 
compositions of the present invention is defined as an amount effective, at dosages and for 
15 periods of time necessary to achieve the desired result. For example, a therapeutically active 
amount of a substance may vary according to factors such as the disease state, age, sex, and 
weight of the individual, and the ability of antibody to elicit a desired response in the 
individual. Dosage regima may be adjusted to provide the optimum therapeutic response. For 
example, several divided doses may be administered daily or the dose may be proportionally 
20 reduced as indicated by the exigencies of the therapeutic situation. 

The active substance may be administered in a convenient manner such as by injection 
(subcutaneous, intravenous, etc.), oral administration, inhalation, transdermal application, or 
rectal administration. Depending on the route of administration, the active substance may be 
coated in a material to protect the compound from the action of enzymes, acids and other 
25 natural conditions which may inactivate the compound. 

The compositions described herein can be prepared by p£LS£ known methods for the 
preparation of pharmaceutical^ acceptable compositions which can be administered to 
subjects, such that an effective quantity of the active substance is combined in a mixture with a 
pharmaceutically acceptable vehicle. Suitable vehicles are described, for example, in 
30 Remington's Pharmaceutical Sciences (Remington's Pharmaceutical Sciences, Mack Publishing 
Company, Easton, Pa., USA 1985). On this basis, the compositions include, albeit not 
exclusively, solutions of the substances in association with one or more pharmaceutically 
acceptable vehicles or diluents, and contained in buffered solutions with a suitable pH and 
iso-osmotic with the physiological fluids. 
35 The reagents suitable for applying the methods of the invention to identify substances 

that affect a SHIP regulatory system may be packaged into convenient kits providing the 
necessary materials packaged into suitable containers. The kits may also include suitable 
supports useful in performing the methods of the invention. 
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The invention also provides methods for examining the function of the SHIP protein. 
Cells, tissues, and non-human animals lacking in SHIP expression or partially lacking in SHIP 
expression may be developed using recombinant expression vectors of the invention having 
specific deletion or insertion mutations in the SHIP gene. For example, the PTB recognition 
sequences, SH2 domain, 5-ptase domain, or proline-rich sequences may be deleted. A 
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recombinant expression vector may be used to inactivate or alter the endogenous gene by 
homologous recombination, and thereby create a SHIP deficient cell, tissue or animal. 

Null alleles may be generated in cells, such as embryonic stem cells by deletion 
mutation. A recombinant SHIP gene may also be engineered to contain an insertion mutation 

5 which inactivates SHIP, Such a construct may then be introduced into a cell, such as an 
embryonic stem cell, by a technique such as transfection, electroporation, injection etc. Cells 
lacking an intact SHIP gene may then be identified, for example by Southern blotting, 
Northern Blotting or by assaying for expression of SHIP using the methods described herein. 
Such cells may then be fused to embryonic stem cells to generate transgenic non-human animals 

10 deficient in SHIP. Germline transmission of the mutation may be achieved, for example, by 
aggregating the embryonic stem cells with early stage embryos, such as 8 cell embryos, in vitro; 
transferring the resulting blastocysts into recipient females and; generating germline 
transmission of the resulting aggregation chimeras. Such a mutant animal may be used to 
define specific cell populations, developmental patterns and in vivo processes, normally 

15 dependent on SHIP expression. 

Hie following non-limiting example are illustrative of the present invention: 

EXAMPLES 

The following materials and methods were utilized in the investigations outlined in 
example 1: 

20 PURIHCATION PROTOCOL 

20 litres of B6SUtA) cells, grown to confluence in RPMI containing 10% FCS and 5 ng/ml 
of GM-CSF, were lysed at 2x107 cells/ml with PSB containing 0.5% NP40 (Liu et al, Mol. Cell. 
Biol. 14, 6926 (1994)) and incubated with GSH-beads bearing GST-Grb2-C-SH3. Bound 
material was eluted by boiling with 1% SDS, 50 mM Tris-Cl, pH 7.5, and diluted to reduce the 

25 SDS to < 0.2% for Amicon YM100, Microcon 30 concentration and 3 rounds of Bio-Sep SEC S3000 
(Phenomehex) HPLC to remove GST-Grb2-C-SH3 and other low molecular weight material. 
Following 2D-PAGE (P.H. O'Farrell, J. Biol. Chem. 250, 4007 (1975)), transfer to a PVDF 
membrane (Liu et al., Mol. Cell. Biol. 14, 6926 (1994)), and Ponceau S staining, the 145-kD 
spot was excised and sent to the Harvard Microchemistry Facility for trypsin digestion, Qg 

30 HPLC and amino acid sequencing. 
CLONING OF cDNA FOR pl45 

Degenerate 3' oligonucleotides were synthesized based on the peptide sequence 
NEMINP, ie 5' GACATCGATGG(G,A)TT(T,G,A)ATCAT(C,T)TC (A,G)TT-3' to carry out PCR 
amplification 3' and 5' from a plasmid library of randomly primed B6SUtAi cDNA employing 

35 5' PCR primers based on plasmid vector sequence flanking the cDNA insertion site. PCR 
reactions (Expand™ Long Template PCR System, Boehringer Mannheim) were separated on 
TAE-agarose gels, transferred to Hybond-N+ Blotting membrane (Amersham) and probed for 
hybridizing bands with a y- 32 P-dATP end-labelled degenerate oligonucleotide based on the 
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upstream, but not overlapping, peptide sequence 

VPAEGVrS'GTAACGGGKCX^OCCCCT^OGC (CXA,G)GA(A,G)G(C,T,A,G)GT-3'. A 
314 bp hybridizing DNA fragment was identified, gel purified, subcloned into Bluescript KS+, 
sequenced and the projected translation confirmed to match that of the original amino acid 
5 sequence obtained with the exception of E-»C at amino acid #4: VPA£GVSSLNEMINP. 
Specific primers were synthesized based on the DNA sequence to proceed both 3' and 5' of the 
314 bp original clone to clone 3 overlapping cDNAs totalling 4047 bp in length and encoding a 
complete coding sequence of 1190 amino acids. DNA sequence was obtained for both strands 
(Amplicycle, Perkin Elmer), employing both subcloning and oligomer primers. Data base 

10 comparisons were performed with the MPSearch program, using the Blitz server operated by 
the European Molecular Biology Laboratory (Heidelberg, Germany). 
Determining If pl45 Is A Phospholns-5-ptase 

PtdIns[32p]-3,4,5-P 3 was prepared using PtdIns-4,5-P 2 and recombinant Ptdlns-3-kinase 
provided by Dr. L. Williams (Chiron Corp) (17). 5-ptase activity was measured by 

15 evaporating 30,000 cpm of TLC purified PtdIns[32p]-3,4,5-P 3 with 150 ug phosphatidylserine 
under N 2 and resuspending by sonication in assay buffer. Reaction mixtures (25 nl) containing 
immunoprecipitate or 5-ptase II, 50 mM Tris-Cl, pH 7.5, 10 mM MgCl 2 and substrate were 
rocked for 30 min at 37°C. Reactions were stopped and the product separated by TLC (LA. 
Norris and P.W. MajerusJ. Biol. Chem. 269,8716(1994)). Hydrolysis of [3H]lns-l,3,4,5-P4 by 

20 immunoprecipitates was measured as above in 25 ^1 containing 16 \xM l3H]lns-l,3,4,5-P4 (6000 
cpm/nmol) under conditions where the reaction was linear with time (20 min, 37°C) and 
enzyme amount (CA. Mitchell et al., J. Biol. Chem. 264,8873 (1989)). Proof that the InsP3 
product was [3H]Ins-l,3,4-P3 was obtained by incubation with recombinant inositol- 
polyphosphate-4- and 1-phosphatase and the bis phosphate products separated on Dowex- 

25 formate. 

LEGENDS FOR FIGURES DISCUSSED IN EXAMPLE 1 

Figure 1. The Grb2-C-SH3 domain specifically binds the tyrosine phosphorylated, She- 
associated pl45. Lysates prepared from B6SUtAj cells (2), treated ± IL-3, were either 
immunoprecipitated with anti-She (Transduction Laboratories), followed by protein A 

30 Sepharose (lanes 1&2) or incubated with GSH bead bound GST-Grb2-N-SH3 (lanes 3&4) or 
GSH bead bound GST-Grb2-C-SH3 (lanes 5&6). Proteins were eluted by boiling in SDS sample 
buffer and subjected to Western analysis using 4G10. For lane 7, lysates from It-3-stimulated 
B6SUtA! cells were incubated with GSH bead bound GST-Grb2-C-SH3, and anti-She 
immunoprecipitates carried out with the unbound material. 

35 Figure 2. Amino acid sequence of pl45. (A) Deduced amino acid sequence of pl45. The hatched 
box indicates the SH2 domain; the heavily underlined amino acids, the 2 target sequences for 
binding to PTB domains; the asterisks, the location of the proline rich motifs; and the lightly 
underlined amino acids, the 2 conserved 5-ptase motifs. Data base comparisions were 



WO 97/12039 PCT/CA96/00655 

-28- 

performed with the MPSearch program using the Blitz server operated by the European 
Molecular Biology Laboratory (Heidelberg, Germany). (B) Diagrammatic representation of 
the various domains within pl45. 

Figure 4. Anti-15m« antiserum recognizes the She-associated pl45 and co-precipitates She. 
5 (A) Lysates from B6SUtAi cells, treated ± IL-3, were either immunoprecipitated with anti- 
She (lanes 1&2), NRS (lanes 3&4) or anti-15™?r (lanes 5&6) or precleared with anti-15 m *r and 
then immunoprecipitated with anti-She (lanes 7&8). Western analysis was then performed 
with 4G10. (B) Lysates from B6SUtA! cells, stimulated with IL-3, were immunoprecipitated 
with anti-She or anti-15 mer and the bound proteins eluted at 23°C for 30 min with SDS-sample 

10 buffer containing 1 mM N-ethylmaleimide in lieu of 2-mercaptoethanol. Western blotting was 
then carried out with 4G10 (upper panel) and the blot reprobed with anti-She (lower panel). 
Figure 5. Expression of p!45 RNA in murine tissues. Northern blot analysis of 2 \ig of polyA 
RNA from various tissues probed with a random primer-labeled PCR fragment encompassing a 
1.5-kb fragment corresponding to the 3' end of the p!45 cDNA (lanes 1-6, spleen, lung, liver, 

15 skeletal muscle, kidney and testes, respectively (Clontech); lane 7, separately prepared blot of 
bone marrow). Similar intensities were observed upon probing with a random primer-labeled 
PCR fragment encompassing a 1.5-kb fragment corresponding to the 5' end. Exposure time was 
30 hrs. In addition to the prominant 5-kb band, a faint band of 4.5-kb was apparent on the 
autoradiogram. 

20 Figure 6. pl45 contains lns-l,3,4,5-P4 and PtdIns-3,4,5-P 3 5-phosphatase activity. (A) 2xl0 7 
B6SUtAj cells were lysed and anti-15 mer , anti-She and NRS immunoprecipitates incubated 
with [ 3 H]Ins-l,3,4,5-P4 under conditions where product formation was linear with time. 
Assays were also carried out ± recombinant 5-ptase II as controls. (B) l/10th of anti-15 mer , 
NRS and anti-She immunoprecipitates (as well as ± recombinant 5-ptase II, ie. 

25 PtII&BL(blank))) were incubated with PtdlnsP 2 P]-3,4,5-P3 under conditions where product 
formation was linear with time and the reaction mixture chromatographed on TLC (18). 

EXAMPLE 1 

In preliminary studies aimed at purifying pl45, immobilized GST fusion proteins 
containing the C-terminal (but not the N-terminal) SH3 domain of Grb2 were found to bind a 

30 prominent tyrosine phosphorylated protein doublet from B6SUtA) cell lysates that possessed 
the same mobility in SDS-gels as pl45 (Figure 1, lanes 1-6). Silver stained gels of Grb2-C-SH3 
bound material indicated this doublet was prominent in terms of protein level as well, and 
most abundant in B6SUtAi cells (compared to M07E, TF1, Ba/F3, DA-3 and 32D cells, data not 
shown). To determine if this Grb2-C-SH3 purified doublet was pl45, B6SUtAi cell lysates 

35 were precleared with Grb2-C-SH3 beads and this dramatically depleted pl45 in subsequent 
anti-She immuno-precipitates (Figure 1, lane 7). Further proof was obtained by carrying out 
2D-PAGE (P.H. O'Farrell, /. Biol Chem. 250, 4007 (1975)) with the two preparations, 
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followed by Western analysis, using anti-PY antibodies. An identical pattern of multiple spots 
was obtained in the 145-kD range, with isoelectric points ranging from 7.2 to 7.8. 

Based on these findings, a purification protocol was devised as described above and 
two sequences were obtained from the purified protein; VPAEGVSSLNEMINP, which was used 
5 to construct degenerate oligonucleotides, and DGSFLVR, which strongly suggested the presence 
of an SH2 domain. 

The full length cDNA for pl45 was then cloned using a PCR based strategy and a 
B6SUtAi cDNA library as described above. The deduced 1190 amino acid sequence, possessing 
a theoretical pi of 7.75 (consistent with the 2D-gel results) revealed several interesting motifs 

10 (Figure 2). Close to the amino terminus is the DGSFLVR sequence that is highly conserved 
among SH2 domains and, taken together with sequences surrounding this motif, suggests that 
pl45 contains an SH2 domain most homologous, at the protein level, to those within Abl, 
Bruton's tyrosine kinase and Grb2. There are also two motifs, ie., INPNY and ENPLY, that, in 
their phosphorylated forms, are theoretically capable of binding to PTB domains ( P. Blaikie 

15 etal.,l Biol Chem. 269, 32031 (1994); W.M. Kavanaugh et al., Science 268, 1177 (1995); 1. 
Dikic et al., /. Biol. Chem. 270, 15125 (1995); P. Bork and B. Margolis, Cell 80, 693 (1995); Z. 
Songyang et al., J. Biol Chem. 270, 14863 (1995); A. Craparo et al, J. Biol Chem. 270,15639 
(1995); P. vanderGeerandT. Pawson, T/BS 20, 277 (1995); A.G. Batzer et al, Mol Cell Biol 
15, 4403 (1995); T. Trub et al., }. Biol Chem. 270, 18205 (1995)). As well, several predicted 

20 proline-rich motifs are present near the carboxy terminus, including both class I (eg, 
PPSQPPLSP) and class II (eg, PVKPSR, PPLSPKK, PPLPVK (K. Alexandropoulos et al., Proc. 
Natl Acad. Sci. U.S.A. 92, 3110 (1995); C. Schumacher et al, /. Biol Chem. 270, 15341 
(1995)). Most interestingly, there are 2 motifs that are highly conserved among 5-ptases, ie, 
WLGDLNYR and, 73 amino acids C-terminal to this, KYNLPSWCDRVLW (X. Zhang et al t 

25 Proc Natl Acad. Sci. U.S.A. 92,4853 (1995). 

To identify tyrosine phosphorylated proteins that interact with p!45 in vivo and to 
confirm pl45 had been sequenced, lysates from B6SUIA] cells were immunoprecipitated with 
rabbit antiserum (ie, anti-15™er) generated against the 15™* used for cloning E. Harlow and D. 
Lane, Antibodies, A Laboratory Manual. Cold Spring Harbor Laboratory, (1988)). Western 

30 analysis, using anti-PY, revealed, as expected, a 145-kD tyrosine phosphorylated doublet 
with an identical mobility in SDS gels to pl45 (Figure 4(A), lanes 1&2 and 5&6). Pre-immune 
serum did not immunoprecipitate this or any other tyrosine phosphorylated protein (Figure 
4(A), lanes 3&4). Moreover, anti-She immunoprecipitates of lysates precleared with anti- 
15mer n0 longer contained pl45 (Figure 4(A), lane 8). Interestingly, anti-15mer 

35 immunoprecipitates from lysates of IL-3-stimulated B6SUtAj cells consistently contained 50- 
55-kD and, occasionally, 75- and 97-kD tyrosine phosphorylated proteins (Figure 4(A), lane 6). 
The 50-55-kD protein was shown to be She by treating anti-15 roer immunoprecipitates with N- 
ethylmaleimide prior to SDS-PAGE to alter the mobility of the interfering IgH chain ( M.R. 
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Block et al., Proc. Natl. Acad. Sci. U.S.A. 85, 7852 (1988)), and then carrying out Western 
analysis with anti-PY (Figure 4(B), upper panel) and anti-She antibodies (Figure 4(B), lower 
panel). 

To examine whether the expression of pl45 was restricted to hemopoietic cells, 

5 Northern blot analysis was carried out with polyA purified RNA from various murine tissues. 
A 5.0-kb pl45 transcript was found to be expressed in bone marrow, lung, spleen, muscle, testes 
and kidney, suggesting the presence of this protein in many cell types (Figure 5). 

Lastly, to determine if pl45 was indeed a 5-ptase, lysates from B6SUtA] cells were 
immunoprecipitated with anti-15 mer , anti-She or normal rabbit serum (NRS) and the 

10 immunoprecipitates tested with various 5-ptase substrates (X. Zhang et al., Proc. Nail. Acad. 
Sci. U.S.A. 92,4853 (1995) and as described herein). As can be seen in Figure 6(A), anti-15 mer , 
but not NRS, immunoprecipitates hydrolyzed PH)Ins-l,3,4,5-P 4 to [ 3 H]Ins-l,3,4-P3. The 
product of the reaction was shown to be PH]Ins-l,3,4-P3 by incubation with recombinant 
inositol-polyphosphate-1- and 4-phosphatases, followed by the separation of the 

15 bisphosphate product on Dowex-formate (Zhang, X., et al., Proc.Natl.Acad.Sci.U.S.A. 
92:4853-4856, 1995 and Jefferson, A.B. And Majerus, P.W. J. Biol. Chem. 270:9370-9377, 1955). 
In the presence of 3 mM EDTA, no hydrolysis of [3H]Ins-l,3,4,5-P 4 was observed, suggesting 
that this 5-ptase is Mg + + -dependent. Interestingly, no significant difference in activity was 
observed between anti-15 mer immunoprecipitates from stimulated and unstimulated cells. 

20 Moreover, as one might expect, anti-She immunoprecipitates possessed 5-ptase activity, but 
only after IL-3-stimulation. In addition, anti-15 mer , but not NRS, immunoprecipitates 
catalyzed the hydrolysis of PtdIns[ 32 P]-3,4,5-P3, as did recombinant 5-ptase II (Figure 6(B)). 
Once again there was no significant difference in activity between IL-3-stimulated and 
unstimulated cells and anti-She immunoprecipitates possessed 5-ptase activity only after cells 

25 were stimulated. This suggests that IL-3 affects only the localization of p!45 and not its 5- 
ptase activity. In studies with other 5-ptase substrates, anti-15 mer immunoprecipitates did not 
hydrolyse Ins-1,4,5-P3 or PtdIns-4,5-P2. P145 5-ptase substrate specificity is therefore distinct 
from that of other 5-ptases such as 5-ptase II, OCRL 5-ptase and a novel Mg ++ -independent 5- 
ptase (Zhang, X., et al., Proc.Natl.Acad.Sci.U.S.A. 92:4853-4856, 1995; Jefferson, A.B. And 

30 Majerus, P.W. J. Biol. Chem. 270:9370-9377, 1955; and Jackson, S.P. Et al., EMBO J. 14:4490- 
4500, 1995). 

Of the 5-ptases cloned to date (X. Zhang et al., Proc. Natl. Acad. Sci. U.S.A. 
92,4853 (1995)), pl45 is the first to possess an SH2 domain and to be tyrosine phosphorylated. 
Thus, pl45 may play an important role in cytokine mediated signalling. In this regard, Cullen 
35 et al just reported that Ins-1,3,4,5-P 4 , which is rapidly elevated in stimulated cells (I.R. 
Batty et al, Biochem. J. 232, 211 (1985)), binds to and stimulates a member of the GAP1 family 
(P.J. Cullen et al., Nature 376, 527 (1995)). It is therefore conceivable that p!45, through its 
association with She, regulates Ras activity by hydrolyzing RasGAP bound lns-l,3,4,5-P 4 . In 
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addition, with its multiple protein:protein interaction domains and its unique 5-ptase 
substrate specificity, pl45 could play an important role in regulating Ca++-independent PKC 
activity (Toker, A., et al., J. Biol. Chem. 269:32358-32367, 1994), the emerging Akt/PKB 
pathway (Burgering, B.M. And Coffer, P.J., Nature 376:599-602, 1995 )and other as yet 
5 uncharacterized PI-3-kinase stimulated cascades. In terms of its association with She, pl45 
may interact via its phosphorylated tyrosines with the SH2 of She, via its phosphorylated 
PTB recognition sequences with the PTB of She (as suggested by in vitro studies with the She- 
associated pl45 in 3T3 cells ( F.A. Norris and P.W. Majerus, /. Biol Chem. 269, 8716 (1994)) 
and/or via its SH2 domain with Y 317 of She 
10 In summary, a tyrosine phosphorylated 145 kDa protein has been purified that 

associates with She in response to multiple cytokines from hemopoietic cells and shown it to be 
a novel, SH2-containing 5-ptase. Based on its properties it is suggested it be called SHIP for 
£H2-containing inositol-phosphatase. 

EXAMPLE 2 

15 Cloning of hSHIP cDNA 

Duplicate nitrocellulose (Schleicher & Schuell, Keene, NH) plaque-lifts were 
prepared from approximately lxlO 6 pfu of a custom-made M07e/M07-ER Xgtll cDNA library 
created from 10^g of poly-A RNA (Clontech, Palo Alto, CA). Phage DNA bound to these 
membranes was denatured and hybridized (1.5X SSPE, 1% SDS, 1% Blotto, 0.25mg/ml ssDNA) 

20 at 50°C for 18 hours with non-overlapping, [>32p]dCTP randomly labeled cDNA fragments 
corresponding to either 1.5 kb of the 5' - most region (including the SH2 domain) or 1.1 kb of the 
central region (including the 5-Ptase domain) of murine SHIP. Probed membranes were washed 
three times with 0.5X SSC, 0.5% SDS at 50°C for 30 minutes each. Membranes were exposed to 
Kodak X-Omat film (Rochester, NY) and plaques which hybridized with both probes were 

25 identified and the phage isolated. Thirteen cDNA inserts were removed from "positive" 
phage by EcoRJ digestion, gel purified, and subcloned into pBluescript KS+ for further 
analysis. One full-length cDNA, 4926 nt in length, was further digested with either PstI or 
Xhol and re-subcloned into pBluescript KS+ for automated ABI/Taq Polymerase sequencing 
(NAPS Unit, University of British Columbia, Vancouver, Canada) using standard T7 and T3 

30 oligoprimers. Regions not overlapped by restriction fragments were sequenced using specific 
nucleotide oligoprimers. The human SHIP CDNA sequence is set out in Figure 10 and in 
SEQ.ID.NO.12. 

Having illustrated and described the principles of the invention in a preferred 
embodiment, it should be appreciated to those skilled in the art that the invention can be 
35 modified in arrangement and detail without departure from such principles. We claim all 
modifications coming within the scope of the following claims. 

All publications, patents and patent applications referred to herein are incorporated 
by reference in their entirety to the same extent as if each individual publication, patent or 
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patent application was specifically and individually indicated to be incorporated by reference 
in its entirety. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Krystal, Gerald 

(B) STREET: 601 West 10th Street 

(C) CITY: Vancouver 

(D) STATE: British Columbia 

(E) COUNTRY: Canada 

(F) POSTAL CODE: V52 1L3 

(ii) TITLE OF INVENTION: SH2 -CONTAINING INOSITOL- PHOSPHATASE 
(iii) NUMBER OF SEQUENCES: 8 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: BERESKIN & PARR 

(B) STREET: 40 KING STREET WEST 

(C) CITY: TORONTO 

(D) STATE: ONTARIO 

(E) COUNTRY: CANADA 

(F) ZIP: M5H 3Y2 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/CA96 / 0065 5 

(B) FILING DATE: 27 SEPT 1996 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Kurdydyk, Linda M. 

(B) REGISTRATION NUMBER: 34,971 

. (C) REFERENCE /DOCKET NUMBER: 7771-018 

(ix) TELECOMMUNICATION INFORMATION- 

(A) TELEPHONE: 416-364-7311 

(B) TELEFAX: 416-361-1398 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4040 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: murine 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: mSHIP 

( ix ) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 139. .3693 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

CCCTGGTAGG AGCAGCAGAG GCAATTTCTG AGAGGCAACA GGCGGCAGGT CTCAGCCTAG 60 

AGAGGGCCCT GAACTACTTT GCTGGAGTGT CCGTCCTGGG AGTGGCTGCT GACCCAGTCC 120 

AGGAGACCCA TGCCTGCC ATG GTC CCT GGG TGG AAC CAT GGC AAC ATC ACC 171 

Met Val Pro Gly Trp Asn His Gly Asn lie Thr 
1.5 10 

CGC TCC AAG GCA GAG GAG CTA CTT TCC AGA GCC GGC AAG GAC GGG AGC 219 
Arg Ser Lys Ala Glu Glu Leu Leu Ser Arg Ala Gly Lys Asp Gly Ser 

.15 20 25 

TTC CTT GTG CGT GCC AGC GAG TCC ATC CCC CGG GCC TGC GCA CTC TGC 2 67 

Phe Leu Val Arg Ala Ser Glu Ser lie Pro Arg Ala Cys Ala Leu Cys 
30 35 40 

GTG CTG TTC CGG AAT TGT GTT TAC ACT TAC AGG ATT CTG CCC AAT GAG 315 
Val Leu Phe Arg Asn Cys Val Tyr Thr Tyr Arg lie Leu Pro Asn Glu 
45 50 55 

GAC GAT AAA TTC ACT GTT CAG GCA TCC GAA GGT GTC CCC ATG AGG TTC 363 
Asp Asp Lys Phe Thr Val Gin Ala Ser Glu Gly Val Pro Met Arg Phe 
60 " 65 70 75 

TTC ACG AAG CTG GAC CAG CTC ATC GAC TTT TAC AAG AAG GAA AAC ATG 411 
Phe Thr Lys Leu Asp Gin Leu lie Asp Phe Tyr Lys Lys Glu Asn Met 
80 85 90 

GGG CTG GTG ACC CAC CTG CAG TAC CCC GTG CCC CTG GAG GAG GAG GAT 459 
Gly Leu Val Thr His Leu Gin Tyr Pro Val Pro Leu Glu Glu Glu Asp 
95 100 105 

GCT ATT GAT GAG GCT GAG GAG GAC ACT GAA AGT GTC ATG TCA CCA CCT 507 
Ala lie Asp Glu Ala Glu Glu Asp Thr Glu Ser Val Met Ser Pro Pro 
110 115 120 

GAG CTG CCT CCC AGA AAC ATT CCT ATG TCT GCC GGG CCC AGC GAG GCC 555 
Glu Leu Pro Pro Arg Asn lie Pro Met Ser Ala Gly Pro Ser Glu Ala 
125 130 135 

AAG GAC CTT CCT CTT GCA ACA GAG AAC CCC CGA GCC CCT GAG GTC ACC 603 
Lys Asp Leu Pro Leu Ala Thr Glu Asn Pro Arg Ala Pro Glu Val Thr 
140 145 150 155 

CGG CTG AGT CTC TCC GAG ACA CTG TTT CAG CGT CTA CAG AGC ATG GAT 651 
Arg Leu Ser Leu Ser Glu Thr Leu Phe Gin Arg Leu Gin Ser Met Asp 
160 165 170 

ACC AGT GGG CTT CCC GAG GAG CAC CTG AAA GCC ATC CAG GAT TAT CTG 699 
Thr Ser Gly Leu Pro Glu Glu His Leu Lys Ala lie Gin Asp Tyr Leu 
175 180 185 

AGC ACT CAG CTC CTC CTG GAT TCC GAC TTT TTG AAA ACG GGC TCC AGC 747 
Ser Thr Gin Leu Leu Leu Asp Ser Asp Phe Leu Lys Thr Gly Ser Ser 
190 195 200 

AAC CTC CCT CAC CTG AAG AAG CTG ATG TCA CTG CTC TGC AAG GAG CTC 795 
Asn Leu Pro His Leu Lys Lys Leu Met Ser Leu Leu Cys Lys Glu Leu 
205 210 215 

CAT GGG GAA GTC ATC AGG ACT CTG CCA TCC CTG GAG TCT CTG CAG AGG 843 
His Gly Glu Val lie Arg Thr Leu Pro Ser Leu Glu Ser Leu Gin Arg 
220 225 230 235 
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TTG TTT GAC CAA CAG CTC TCC CCA GGC CTT CGC CCA CGA CCT CAG GTG 891 
Leu Phe Asp Gin Gin Leu Ser Pro Gly Leu Arg Pro Arg Pro Gin Val 
240 245 ~ 250 

CCC GGA GAG GCC AGT CCC ATC ACC ATG GTT GCC AAA CTC AGC CAA TTG 93 9 

Pro Gly Glu Ala Ser Pro lie Thr Met Val Ala Lys Leu Ser Gin Leu 
255 260 265 

ACA AGT CTG CTG TCT TCC ATT GAA GAT AAG GTC AAG TCC TTG CTG CAC 987 
Thr Ser Leu Leu Ser Ser lie Glu Asp Lys Val Lys Ser Leu Leu His 
270 275 280 

GAG GGC TCA GAA TCT ACC AAC AGG CGT TCC CTT ATC CCT CCG GTC ACC 103 5 

Glu Gly Ser Glu Ser Thr Asn Arg Arg Ser Leu lie Pro Pro Val Thr 
285 290 295 

TTT GAG GTG AAG TCA GAG TCC CTG GGC ATT CCT CAG AAA ATG CAT CTC 1083 
Phe Glu Val Lys Ser Glu Ser Leu Gly lie Pro Gin Lys Met His Leu 
300 305 310 ' 315 

AAA GTG GAC GTT GAG TCT GGG AAA CTG ATC GTT AAG AAG TCC AAG GAT 1131 
Lys Val Asp Val Glu Ser Gly Lys Leu He Val Lys Lys Ser Lys Asp 
320 325 330 

GGT TCT GAG GAC AAG TTC TAC AGC CAC- AAA AAA ATC CTG CAG CTC ATT 1179 
Gly Ser Glu Asp Lys Phe Tyr Ser His Lys Lys He Leu Gin Leu He 
335 340 345 

AAG TCC CAG AAG TTT CTA AAC AAG TTG GTG ATT TTG GTG GAG ACG GAG 1227 
Lys Ser Gin Lys Phe Leu Asn Lys Leu Val He Leu Val Glu Thr Glu 
350 355 360 

AAG GAG AAA ATC CTG AGG AAG GAA TAT GTT TTT GCT GAC TCT AAG AAA 1275 
Lys Glu Lys He Leu Arg Lys Glu Tyr Val Phe Ala Asp Ser Lys Lys 
365 370 375 

AGA GAA GGC TTC TGT CAA CTC CTG CAG CAG ATG AAG AAC AAG CAT TCG 1323 
Arg Glu Gly Phe Cys' Gin Leu Leu Gin Gin Met Lys Asn Lys His Ser 
380 385 390 395 

GAG CAG CCA GAG CCT GAC ATG ATC ACC ATC TTC ATT GGC ACT TGG AAC 1371 
Glu Gin Pro Glu Pro Asp Met He Thr He Phe He Gly Thr Trp Asn 
400 405 ^ 410 

ATG GGT AAT GCA CCC CCT CCC AAG AAG ATC ACG TCC TGG TTT CTC TCC 1419 
Met Gly Asn Ala Pro Pro Pro Lys Lys He Thr Ser Trp Phe Leu Ser 
415 420 425 

AAG GGG CAG GGA AAG ACA CGG GAC GAC TCT GCT GAC TAC ATC CCC CAT 1467 
Lys Gly Gin Gly Lys Thr Arg Asp Asp Ser Ala Asp Tyr He Pro His 
430 435 440 

GAC ATC TAT GTG ATT GGC ACC CAG GAG GAT CCC CTT GGA GAG AAG GAG 1515 
Asp He Tyr Val He Gly Thr Gin Glu Asp Pro Leu Gly Glu Lys Glu 
445 450 455 

TGG CTG GAG CTA CTC AGG CAC TCC CTG CAA GAA GTC ACC AGC ATG ACA 1563 
Trp Leu Glu Leu Leu Arg His Ser Leu Gin Glu Val Thr Ser Met Thr 
460 465 470 475 

TTT AAA ACA GTT GCC ATC CAC ACC CTC TGG AAC ATT CGC ATA GTG GTG 1611 
Phe Lys Thr Val Ala He His Thr Leu Trp Asn He Arg He Val Val 
480 485 490 

CTT GCC AAG CCA GAG CAT GAG AAT CGG ATC AGC CAT ATC TGC ACT GAC 1659 
Leu Ala Lys Pro Glu His Glu Asn Arg He Ser His He Cys Thr Asp 
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495 500 505 

AAC GTG AAG AC A GGC ATC GCC AAC ACC CTG GGA AAC AAG GGA GCA GTG 1707 
Asn Val Lys Thr Gly lie Ala Asn Thr Leu Gly Asn Lys Gly Ala Val 
510 515 520 

GGA GTG TCC TTC ATG TTC AAT GGA ACC TCC TTG GGG TTC GTC AAC AGC 1755 
Gly Val Ser Phe Met Phe Asn Gly Thr Ser Leu Gly Phe Val Asn Ser 
525 530 535 

CAC TTG ACT TCT GGA AGT GAA AAA AAG CTC AGG AGA AAT CAA AAC TAT 1803 
His Leu Thr Ser Gly Ser Glu Lys Lys Leu Arg Arg Asn Gin Asn Tyr 
540 545 550 555 

ATG AAC ATC CTG CGG TTC CTG GCC CTG GGA GAC AAG AAG CTA AGC CCA 1851 
Met Asn lie Leu Arg Phe Leu Ala Leu Gly Asp Lys Lys Leu Ser Pro 
560 565 570 

TTT AAC ATC ACC CAC CGC TTC ACC CAC CTC TTC TGG CTT GGG GAT CTC 1899 
Phe Asn lie Thr His Arg Phe Thr His Leu Phe Trp Leu Gly Asp Leu 
575 580 585 

AAC TAC CGC GTG GAG CTG CCC ACT TGG GAG GCA GAG GCC ATC ATC CAG 1947 
Asn Tyr Arg Val Glu Leu Pro Thr Trp Glu Ala Glu Ala He He Gin 
590 595 600 

AAG ATC AAG CAA CAG CAG TAT TCA GAC CTT CTG GCC CAC GAC CAA CTG 1995 
Lys He Lys Gin Gin Gin Tyr Ser Asp Leu Leu Ala His Asp Gin Leu 
605 610 615 

CTC CTG GAG AGG AAG GAC CAG AAG GTC TTC CTG CAC TTT GAG GAG GAA 2043 
Leu Leu Glu Arg Lys Asp Gin Lys Val Phe Leu His Phe Glu Glu Glu 
620 625 630 635 

GAG ATC ACC TTC GCC CCC ACC TAT CGA TTT GAA AGA CTG ACC CGG GAC 2091 
Glu He Thr Phe Ala Pro Thr Tyr Arg Phe Glu Arg Leu Thr Arg Asp 
640 645 650 

AAG TAT GCA TAC ACG AAG CAG AAA GCA ACA GGG ATG AAG TAC AAC TTG 2139 
Lys Tyr Ala Tyr Thr Lys Gin Lys Ala Thr Gly Met Lys Tyr Asn Leu 
655 660 665 

CCG TCC TGG TGC GAC CGA GTC CTC TGG AAG TCT TAC CCG CTG GTG CAT 2187 
Pro Ser Trp Cys Asp Arg Val Leu Trp Lys Ser Tyr Pro Leu Val His 
670 675 680 

GTG GTC TGT CAG TCC TAT GGC AGT ACC AGT GAC ATC ATG ACG AGT GAC 2235 
Val Val Cys Gin Ser Tyr Gly Ser Thr Ser Asp He Met Thr Ser Asp 
685 690 695 

CAC AGC CCT GTC TTT GCC ACG TTT GAA GCA GGA GTC ACA TCT CAA TTC 2283 
His Ser Pro Val Phe Ala Thr Phe Glu Ala Gly Val Thr Ser Gin Phe 
700 705 710 715 

GTC TCC AAG AAT GGT CCT GGC ACT GTA GAT AGC CAA GGG CAG ATC GAG 2331 
Val Ser Lys Asn Gly Pro Gly Thr Val Asp Ser Gin Gly Gin He Glu 
720 725 730 

TTT CTT GCA TGC TAC GCC ACA CTG AAG ACC AAG TCC CAG ACT AAG TTC 2379 
Phe Leu Ala Cys Tyr Ala Thr Leu Lys Thr Lys Ser Gin Thr Lys Phe 
735 740 745 

TAC TTG GAG TTC CAC TCA AGC TGC TTA GAG AGT TTT GTC AAG AGT CAG 2427 
Tyr Leu Glu Phe His Ser Ser Cys Leu Glu Ser Phe Val Lys Ser Gin 
750 755 760 
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GAA GGA GAG AAT GAA GAG GGA AGT GAA GGA GAG CTG GTG GTA CGG TTT 2475 
Glu Gly Glu Asn Glu Glu Gly Ser Glu Gly Glu Leu Val Val Arg Phe 
765 770 775 

GGA GAG ACT CTT CCC AAG CTA AAG CCC ATT ATC TCT GAC CCC GAG TAC 2523 
Gly Glu Thr Leu Pro Lys Leu Lys Pro lie lie Ser Asp Pro Glu Tyr 
780 785 790 795 

TTA CTG GAC CAG CAT ATC CTG ATC AGC ATT AAA TCC TCT GAC AGT GAC 2571 
Leu Leu Asp Gin His lie Leu lie Ser lie Lys Ser Ser Asp Ser Asp 
800 805 810 

GAG TCC TAT GGT GAA GGC TGC ATT GCC CTT CGC TTG GAG ACC ACA GAG 2619 
Glu Ser Tyr Gly Glu Gly Cys lie Ala Leu Arg Leu Glu Thr Thr Glu 
815 820 825 

GCT CAG CAT CCT ATC TAC ACG CCT CTC ACC CAC CAT GGG GAG ATG ACT 2667 
Ala Gin His Pro lie Tyr Thr Pro Leu Thr His His Gly Glu Met Thr 
830 835 840 

GGC CAC TTC AGG GGA GAG ATT AAG CTG CAG ACC TCC CAG GGC AAG ATG 2715 
Gly His Phe Arg Gly Glu lie Lys Leu Gin Thr Ser Gin Gly Lys Met 
845 850 855 

AGG GAG AAG CTC TAT GAC TTT GTG AAG ACA GAG CGG GAT GAA TCC AGT 2763 
Arg Glu Lys Leu Tyr Asp Phe Val Lys Thr Glu Arg Asp Glu Ser Ser 
860 865 870 875 

GGA ATG AAA TGC TTG AAG AAC CTC ACC AGC CAT GAC CCT ATG AGG CAA 2811 
Gly Met Lys Cys Leu Lys Asn Leu Thr Ser His Asp Pro Met Arg Gin 
880 885 890 

TGG GAG CCT TCT GGC AGG GTC CCT GCA TGT GGT GTC TCC AGC CTC AAT 2 859 

Trp Glu Pro Ser Gly Arg Val Pro Ala Cys Gly Val Ser Ser Leu Asn 
895 900 905 

GAG ATG ATC AAT CCA AAC TAC ATT GGT ATG GGG CCT TTT GGA CAG CCC 2 907 

Glu Met lie Asn Pro Asn Tyr lie Gly Met Gly Pro Phe Gly Gin Pro 
910 915 920 

CTG CAT GGG AAA TCA ACC CTG TCC CCA GAT CAG CAA CTC ACA GCT TGG 2955 
Leu His Gly Lys Ser Thr Leu Ser Pro Asp Gin Gin Leu Thr Ala Trp 
925 930 935 

AGT TAT GAC CAG CTA CCC AAA GAC TCC TCC CTG GGG CCT GGG AGG GGG 3003 
Ser Tyr Asp Gin Leu Pro Lys Asp Ser Ser Leu Gly Pro Gly Arg Gly 
940 945 950 955 

GAG GGT CCT CCA ACC CCT CCC TCC CAA CCA CCT CTG TCG CCA AAG AAG 3051 
Glu Gly Pro Pro Thr Pro Pro Ser Gin Pro Pro Leu Ser Pro Lys Lys 
960 965 970 

TTT TCA TCT TCC ACA ACC AAC CGA GGT CCC TGC CCC AGG GTG CAA GAG 3099 
Phe Ser Ser Ser Thr Thr Asn Arg Gly Pro Cys Pro Arg Val Gin Glu 
975 980 985 

GCA AGA CCT GGG GAT CTG GGA AAG GTG GAA GCT CTG CTC CAG GAG GAC 3147 
Ala Arg Pro Gly Asp Leu Gly Lys Val Glu Ala Leu Leu Gin Glu Asp 
990 995 1000 

CTG CTG CTG ACG AAG CCC GAG ATG TTT GAG AAC CCA CTG TAT GGA TCC 3195 
Leu Leu Leu Thr Lys Pro Glu Met Phe Glu Asn Pro Leu Tyr Gly Ser 
1005 1010 1015 

GTG AGT TCC TTC CCT AAG CTG GTG CCC AGG AAA GAG CAG GAG TCT CCC 3243 
Val Ser Ser Phe Pro Lys Leu Val Pro Arg Lys Glu Gin Glu Ser Pro 
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1020 1025 1030 1035 

AAG ATG CTG CGG AAG GAG CCC CCG CCC TGT CCA GAC CCA GGA ATC TCA 3291 
Lys Met Leu Arg Lys Glu Pro Pro Pro Cys Pro Asp Pro Gly lie Ser 
1040 1045 1050 

TCA CCC AGC ATC GTG CTC CCC AAA GCC CAA GAG GTG GAG AGT GTC AAG 3339 
Ser Pro Ser lie Val Leu Pro Lys Ala Gin Glu Val Glu Ser Val Lys 
1055 1060 1065 

GGG ACA AGC AAA CAG GCC CCT GTG CCT GTC CTT GGC CCC AC A CCC CGG 3 387 

Gly Thr Ser Lys Gin Ala Pro Val Pro Val Leu Gly Pro Thr Pro Arg 
1070 1075 1080 

ATC CGC TCC TTT ACC TGT TCT TCT TCT GCT GAG GGC AGA ATG ACC AGT 3435 
lie Arg Ser Phe Thr Cys Ser Ser Ser Ala Glu Gly Arg Met Thr Ser 
1085 1090 1095 

GGG GAC AAG AGC CAA GGG AAG CCC AAG GCC TCA GCC AGT TCC CAA GCC 3483 
Gly Asp Lys Ser Gin Gly Lys Pro Lys Ala Ser. Ala Ser Ser Gin Ala 
1100 1105 1110 1115 

CCA GTG CCA GTC AAG AGG CCT GTC AAG CCT TCC AGG TCA GAA ATG AGC 3 531 

Pro Val Pro Val Lys Arg Pro Val Lys Pro Ser Arg Ser Glu Met Ser 
1120 1125 1130 

CAG CAG ACA ACA CCC ATC CCA GCT CCA CGG CCA CCC CTG CCA GTC AAG 3 579 

Gin Gin Thr Thr Pro He Pro Ala Pro Arg Pro Pro Leu Pro Val Lys 
1135 1140 1145 

AGT CCT GCT GTC CTG CAG CTG CAA CAT TCC AAA GGC AGA GAC TAC CGT 3 627 

Ser Pro Ala Val Leu Gin Leu Gin His Ser Lys Gly Arg Asp Tyr Arg 
1150 1155 1160 

GAC AAC ACA GAA CTC CCC CAC CAT GGC AAG CAC CGC CAA GAG GAG GGG 3675 
Asp Asn Thr Glu Leu Pro His His Gly Lys His Arg Gin Glu Glu Gly 
1165 1170 1175 

CTG CTT GGC AGG ACT GCC ATGCAGTGAG CTGCTGGTGA TCGGAGCCTG 3723 
Leu Leu Gly Arg Thr Ala 
1180 1185 



GAGGAACAGC 


ACAAAGCAGA 


CCTGCGACCT 


CTCTCAGGAT 


GCCTCTCTCA 


GGATGCCTCT 


3783 


TGGAGGACCT 


CCTGCTAGCT 


CTTCTTGCCT 


AGCTTCAAGT 


CCCAGGCTGT 


GTATTTTTTT 


3843 


TCAGGAAACG 


GCCTCACTTC 


TCTGTGGTCC 


AAGAAGTGTG 


CTGCTGGCTG 


CCACACTGTG 


3903 


CGGCAGATGC 


TAAAGCTGGA 


TGACAAACGC 


ACGCCATACA 


GACAGCAGAC 


AGCGGCACTG 


3963 


GGTCTCAGAA 


CTTGGATTCC 


TGGGCCTTCT 


TCCAGTCGCC 


GTTTTAAAGA 


AAGGAACTAA 


4023 


CGGAGCTGCT 


CATCCGA 










4040 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1185 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
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Met Val Pro Gly Trp Asn His Gly Asn lie Thr Arg Ser Lys Ala Glu 
1 5 10 15 

Glu Leu Leu Ser Arg Ala Gly Lys Asp Gly Ser Phe Leu Val Arg Ala 
20 25 30 

Ser Glu Ser lie Pro Arg Ala Cys Ala Leu Cys Val Leu Phe Arg Asn 
35 40 45 

Cys Val Tyr Thr Tyr Arg lie Leu Pro Asn Glu Asp Asp Lys Phe Thr 
50 55 60 

Val Gin Ala Ser Glu Gly Val Pro Met Arg Phe Phe Thr Lys Leu Asp 
65 70 75 80 

Gin Leu lie Asp Phe Tyr Lys Lys Glu Asn Met Gly Leu Val Thr His 
85 90 95 

Leu Gin Tyr Pro Val Pro Leu Glu Glu Glu Asp Ala lie Asp Glu Ala 
100 105 110 

Glu Glu Asp Thr Glu Ser Val Met Ser Pro Pro Glu Leu Pro Pro Arg 
115 120 125 

Asn lie Pro Met Ser Ala Gly Pro Ser Glu Ala Lys Asp Leu Pro Leu 
130 135 140 

Ala Thr Glu Asn Pro Arg Ala Pro Glu Val Thr Arg Leu Ser Leu Ser 
145 150 155 160 

Glu Thr Leu Phe Gin Arg Leu Gin Ser Met Asp Thr Ser Gly Leu Pro 
165 170 175 

Glu Glu His Leu Lys Ala lie Gin Asp Tyr Leu Ser Thr Gin Leu Leu 
180 185 190 

Leu Asp Ser Asp Phe Leu Lys Thr Gly Ser Ser Asn Leu Pro His Leu 
195 200 205 

Lys Lys Leu Met Ser Leu Leu Cys Lys Glu Leu His Gly Glu Val lie 
210 215 220 

Arg Thr Leu Pro Ser Leu Glu Ser Leu Gin Arg Leu Phe Asp Gin Gin 
225 230 235 240 

Leu Ser Pro Gly Leu Arg Pro Arg Pro Gin Val Pro Gly Glu Ala Ser 
245 250 255 

Pro lie Thr Met Val Ala Lys Leu Ser Gin Leu Thr Ser Leu Leu Ser 
260 265 270 

Ser lie Glu Asp Lys Val Lys Ser Leu Leu His Glu Gly Ser Glu Ser 
275 280 285 

Thr Asn Arg Arg Ser Leu lie Pro Pro Val Thr Phe Glu Val Lys Ser 
290 295 300 

Glu Ser Leu Gly lie Pro Gin Lys Met His Leu Lys Val Asp Val Glu 
305 310 315 320 

Ser Gly Lys Leu lie Val Lys Lys Ser Lys Asp Gly Ser Glu Asp Lys 
325 330 335 

Phe Tyr Ser His Lys Lys lie Leu Gin Leu lie Lys Ser Gin Lys Phe 
340 345 350 
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Leu Asn Lys Leu Val lie Leu Val Glu Thr Glu Lys Glu Lys lie Leu 
355 360 365 

Arg Lys Glu Tyr Val Phe Ala Asp Ser Lys Lys Arg Glu Gly Phe Cys 
370 375 380 

Gin Leu Leu Gin Gin Met Lys Asn Lys His Ser Glu Gin Pro Glu Pro 
385 390 395 400 

Asp Met lie Thr lie Phe lie Gly Thr Trp Asn Met Gly Asn Ala Pro 
405 410 415 

Pro Pro Lys Lys lie Thr Ser Trp Phe Leu Ser Lys Gly Gin Gly Lys 
420 425 430 

Thr Arg Asp Asp Ser Ala Asp Tyr lie Pro His Asp lie Tyr Val lie 
435 440 445 

Gly Thr Gin Glu Asp Pro Leu Gly Glu Lys Glu Trp Leu Glu Leu Leu 
450 455 460 

Arg His Ser Leu Gin Glu Val Thr Ser Met Thr Phe Lys Thr Val Ala 
465 470 475 480 

lie His Thr Leu Trp Asn lie Arg lie Val Val Leu Ala Lys Pro Glu 
485 490 495 

His Glu Asn Arg lie Ser His lie Cys Thr Asp Asn Val Lys Thr Gly 
500 505 510 

lie Ala Asn Thr Leu Gly Asn Lys Gly Ala Val Gly Val Ser Phe Met 
515 520 525 

Phe Asn Gly Thr Ser Leu Gly Phe Val Asn Ser His Leu Thr Ser Gly 
530 535 540 

Ser Glu Lys Lys Leu Arg Arg Asn Gin Asn Tyr Met Asn lie Leu Arg 
545 ~ 550 555 560 

Phe Leu Ala Leu Gly Asp Lys Lys Leu Ser Pro Phe Asn lie Thr His 
565 570 575 

Arg Phe Thr His Leu Phe Trp Leu Gly Asp Leu Asn Tyr Arg Val Glu 
580 585 590 

Leu Pro Thr Trp Glu Ala Glu Ala lie lie Gin Lys lie Lys Gin Gin 
595 600 605 

Gin Tyr Ser Asp Leu Leu Ala His Asp Gin Leu Leu Leu Glu Arg Lys 
610 615 620 

Asp Gin Lys Val Phe Leu His Phe Glu Glu Glu Glu He Thr Phe Ala 
625 630 635 640 

Pro Thr Tyr Arg Phe Glu Arg Leu Thr Arg Asp Lys Tyr Ala Tyr Thr 
645 650 655 

Lys Gin Lys Ala Thr Gly Met Lys Tyr Asn Leu Pro Ser Trp Cys Asp 
660 665 670 

Arg Val Leu Trp Lys Ser Tyr Pro Leu Val His Val Val Cys Gin Ser 
675 680 685 

Tyr Gly Ser Thr Ser Asp He Met Thr Ser Asp His Ser Pro Val Phe 
690 695 '700 
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Ala Thr Phe Glu Ala Gly Val Thr Ser Gin Phe Val Ser Lys Asn Gly 
705 710 715 720 

Pro Gly Thr Val Asp Ser Gin Gly Gin lie Glu Phe Leu Ala Cys Tyr 
725 730 735 

Ala Thr Leu Lys Thr Lys Ser Gin Thr Lys Phe Tyr Leu Glu Phe His 
740 745 ~ 750 

Ser Ser Cys Leu Glu Ser Phe Val Lys Ser Gin Glu Gly Glu Asn Glu 
755 760 765 

Glu Gly Ser Glu Gly Glu Leu Val Val Arg Phe Gly Glu Thr Leu Pro 
770 775 780 

Lys Leu Lys Pro lie lie Ser Asp Pro Glu Tyr Leu Leu Asp Gin His 
785 790 795 800 

lie Leu lie Ser He Lys Ser Ser Asp Ser Asp Glu Ser Tyr Gly Glu 
805 810 815 

Gly Cys He Ala Leu Arg Leu Glu Thr Thr Glu Ala Gin His Pro He 
820 825 830 

Tyr Thr Pro Leu Thr His His Gly Glu Met Thr Gly His Phe Arg Gly 
835 840 845 

Glu He Lys Leu Gin Thr Ser Gin Gly Lys Met Arg Glu Lys Leu Tyr 
850 855 860 

Asp Phe Val Lys Thr Glu Arg Asp Glu Ser Ser Gly Met Lys Cys Leu 
865 870 875 880 

Lys Asn Leu Thr Ser His Asp Pro Met Arg Gin Trp Glu Pro Ser Gly 
885 890 895 

Arg Val Pro Ala Cys Gly Val Ser Ser Leu Asn Glu Met He Asn Pro 
900 905 910 

Asn Tyr He Gly Met Gly Pro Phe Gly Gin Pro Leu His Gly Lys Ser 
915 920 925 

Thr Leu Ser Pro Asp Gin Gin Leu Thr Ala Trp Ser Tyr Asp Gin Leu 
930 935 940 

Pro Lys Asp Ser Ser Leu Gly Pro Gly Arg Gly Glu Gly Pro Pro Thr 
945 950 955 960 

Pro Pro Ser Gin Pro Pro Leu Ser Pro Lys Lys Phe Ser Ser Ser Thr 
965 970 975 

Thr Asn Arg Gly Pro Cys Pro Arg Val Gin Glu Ala Arg Pro Gly Asp 
980 985 " 990 

Leu Gly Lys Val Glu Ala Leu Leu Gin Glu Asp Leu Leu Leu Thr Lys 
995 1000 1005 

Pro Glu Met Phe Glu Asn Pro Leu Tyr Gly Ser Val Ser Ser Phe Pro 
1010 1015 1020 

Lys Leu Val Pro Arg Lys Glu Gin Glu Ser Pro Lys Met Leu Arg Lys 
1025 1030 1035 1040 

Glu Pro Pro Pro Cys Pro Asp Pro Gly He Ser Ser Pro Ser He Val 
1045 1050 1055 
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Leu Pro Lys Ala Gin Glu Val Glu Ser Val Lys Gly Thr Ser Lys Gin 
1060 1065 1070 

Ala Pro Val Pro Val Leu Gly Pro Thr Pro Arg lie Arg Ser Phe Thr 
1075 1080 1085 

Cys Ser Ser Ser Ala Glu Gly Arg Met Thr Ser Gly Asp Lys Ser Gin 
1090 1095 1100 

Gly Lys Pro Lys Ala Ser Ala Ser Ser Gin Ala Pro Val Pro Val Lys 
1105 1110 1115 1120 

Arg Pro Val Lys Pro Ser Arg Ser Glu Met Ser Gin Gin Thr Thr Pro 
1125 1130 1135 

lie Pro Ala Pro Arg Pro Pro Leu Pro Val Lys Ser Pro Ala Val Leu 
1140 1145 1150 

Gin Leu Gin His Ser Lys Gly Arg Asp Tyr Arg Asp Asn Thr Glu Leu 
1155 1160 1165 

Pro His His Gly Lys His Arg Gin Glu Glu Gly Leu Leu Gly Arg Thr 
1170 1175 1180 

Ala 
1185 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3031 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: She Proteins 

(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION: 82.. 1503 

<xi> SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GCGGTAACCT AAGCTGGCAG TGGCGTGATC CGGCACCAAA TCGGCCCGCG GTGCGTGCGG 60 

AGACTCCATG AGGCCCTGGA C ATG AAC AAG CTG AGT GGA GGC GGC GGG CGC 111 

Met Asn Lys Leu Ser Gly Gly Gly Gly Arg 
1 5 10 

AGG ACT CGG GTG GAA GGG GGC CAG CTT GGG GGC GAG GAG TGG ACC CGC 159 
Arg Thr Arg Val Glu Gly Gly Gin Leu Gly Gly Glu Glu Trp Thr Arg 
15 20 25 

CAC GGG AGC TTT GTC AAT AAG CCC ACG CGG GGC TGG CTG CAT CCC AAC 207 
His Gly Ser Phe Val Asn Lys Pro Thr Arg Gly Trp Leu His Pro Asn 
30 35 40 

GAC AAA GTC ATG GGA CCC GGG GTT TCC TAC TTG GTT CGG TAC ATG GGT 255 
Asp Lys Val Met Gly Pro Gly Val Ser Tyr Leu Val Arg Tyr Met Gly 
45 50 55 
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TGT GTG GAG GTC CTC CAG TCA ATG CGT GCC CTG GAC TTC AAC ACC CGG 303 

Cys Val Glu Val Leu Gin Ser Met Arg Ala Leu Asp Phe Asn Thr Arg 

60 65 70 

ACT CAG GTC ACC AGG GAG GCC ATC AGT CTG GTG TGT GAG GCT GTG CCG 3 51 

Thr Gin Val Thr Arg Glu Ala He Ser Leu Val Cys Glu Ala Val Pro 
75 80 85 90 

GGT GCT AAG GGG GCG AC A AGG AGG AGA AAG CCC TGT AGC CGC CCG CTC 399 
Gly Ala Lys Gly Ala Thr Arg Arg Arg Lys Pro Cys Ser Arg Pro Leu 
95 100 105 

AGC TCT ATC CTG GGG AGG AGT AAC CTG AAA TTT GCT GGA ATG CCA ATC 447 
Ser Ser He Leu Gly Arg Ser Asn Leu Lys Phe Ala Gly Met Pro He 
110 115 120 

ACT CTC ACC GTC TCC ACC AGC AGC CTC AAC CTC ATG GCC GCA GAC TGC 495 
Thr Leu Thr Val Ser Thr Ser Ser Leu Asn Leu Met Ala Ala Asp Cys 
125 130 135 

AAA CAG ATC ATC GCC AAC CAC CAC ATG CAA TCT ATC TCA TTT GCA TCC 543 
Lys Gin He He Ala Asn His His Met Gin Ser He Ser Phe Ala Ser 
140 145 150 

GGC GGG GAT CCG GAC AC A GCC GAG TAT GTC GCC TAT GTT GCC AAA GAC 591 
Gly Gly Asp Pro Asp Thr Ala Glu Tyr Val Ala Tyr Val Ala Lys Asp 
155 160 165 170 

CCT GTG AAT CAG AGA GCC TGC CAC ATT CTG GAG TGT CCC GAA GGG CTT 639 
Pro Val Asn Gin Arg Ala Cys His He Leu Glu Cys Pro Glu Gly Leu 
175 180 185 

GCC CAG GAT GTC ATC AGC ACC ATT GGC CAG GCC TTC GAG TTG CGC TTC 687 
Ala Gin Asp Val He Ser Thr He Gly Gin Ala Phe Glu Leu Arg Phe 
190 195 200 

AAA CAA TAC CTC AGG AAC CCA CCC AAA CTG GTC ACC CCT CAT GAC AGG 735 
Lys Gin Tyr Leu Arg Asn Pro Pro Lys Leu Val Thr Pro His Asp Arg 
205 210 215 

ATG GCT GGC TTT GAT GGC TCA GCA TGG GAT GAG GAG GAG GAA GAG CCA 783 
Met Ala Gly Phe Asp Gly Ser Ala Trp Asp Glu Glu Glu Glu Glu Pro 
220 225 230 

CCT GAC CAT CAG TAC TAT AAT GAC TTC CCG GGG AAG GAA CCC CCC TTG 831 
Pro Asp His Gin Tyr Tyr Asn Asp Phe Pro Gly Lys Glu Pro Pro Leu 
235 240 245 250 

GGG GGG GTG GTA GAC ATG AGG CTT CGG GAA GGA GCC GCT CCA GGG GCT 879 
Gly Gly Val Val Asp Met Arg Leu Arg Glu Gly Ala Ala Pro Gly Ala 
255 260 265 

GCT CGA CCC ACT GCA CCC AAT GCC CAG ACC CCC AGC CAC TTG GGA GCT 927 
Ala Arg Pro Thr Ala Pro Asn Ala Gin Thr Pro Ser His Leu Gly Ala 
270 275 280 

ACA TTG CCT GTA GGA CAG CCT GTT GGG GGA GAT CCA GAA GTC CGC AAA 975 
Thr Leu Pro Val Gly Gin Pro Val Gly Gly Asp Pro Glu Val Arg Lys 
285 290 295 

CAG ATG CCA CCT CCA CCA CCC TGT CCA GGC AGA GAG CTT TTT GAT GAT 1023 
Gin Met Pro Pro Pro Pro Pro Cys Pro Gly Arg Glu Leu Phe Asp Asp 
300 305 310 

CCC TCC TAT GTC AAC GTC CAG AAC CTA GAC AAG GCC CGG CAA GCA GTG 1071 
Pro Ser Tyr Val Asn Val Gin Asn Leu Asp Lys Ala Arg Gin Ala Val 
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315 320 . 325 330 

GGT GGT GCT GGG CCC CCC AAT CCT GCT ATC AAT GGC AGT GCA CCC CGG 1119 
Gly Gly Ala Gly Pro Pro Asn Pro Ala lie Asn Gly Ser Ala Pro Arg 
335 340 345 

GAC CTG TTT GAC. ATG AAG CCC TTC GAA GAT GCT CTT CGG GTG CCT CCA 1167 
Asp Leu Phe Asp Met Lys Pro Phe Glu Asp Ala Leu Arg Val Pro Pro 
350 355 360 

CCT CCC CAG TCG GTG TCC ATG GCT GAG CAG CTC CGA GGG GAG CCC TGG 1215 
Pro Pro Gin Ser Val Ser Met Ala Glu Gin Leu Arg Gly Glu Pro Trp 
365 370 375 

TTC CAT GGG AAG CTG AGC CGG CGG GAG GCT GAG GCA CTG CTG CAG CTC 1263 
Phe His Gly Lys Leu Ser Arg Arg Glu Ala Glu Ala Leu Leu Gin Leu 
380 ' 385 390 

AAT GGG GAC TTC TTG GTA CGG GAG AGC ACG ACC ACA CCT GGC CAG TAT 1311 
Asn Gly Asp Phe Leu Val Arg Glu Ser Thr Thr Thr Pro Gly Gin Tyr 
395 " 400 405 410 

GTG CTC ACT GGC TTG CAG AGT GGG CAG CCT AAG CAT TTG CTA CTG GTG 1359 
Val Leu Thr Gly Leu Gin Ser Gly Gin Pro Lys His Leu Leu Leu Val 
415 420 425 

GAC CCT GAG GGT GTG GTT CGG ACT AAG GAT CAC CGC TTT GAA AGT GTC 1407 
Asp Pro Glu Gly Val Val Arg Thr Lys Asp His Arg Phe Glu Ser Val 
430 435 440 

AGT CAC CTT ATC AGC TAC CAC ATG GAC AAT CAC TTG CCC ATC ATC TCT 1455 
Ser His Leu lie Ser Tyr His Met Asp Asn His Leu Pro lie lie Ser 
445 450 455 

GCG GGC AGC GAA CTG TGT CTA CAG CAA CCT GTG GAG CGG AAA CTG TGA 1503 
Ala Gly Ser Glu Leu Cys Leu Gin Gin Pro Val Glu Arg Lys Leu * 
460 465 470 

TCTGCCCTAG CGCTCTCTTC CAGAAGATGC CCTCCAATCC TTTCCACCCT ATTCCCTAAC 1563 

TCTCGGGACC TCGTTTGGGA GTGTTCTGTG GGCTTGGCCT TGTGTCAGAG CTGGGAGTAG 1623 

CATGGACTCT GGGTTTCATA TCCAGCTGAG TGAGAGGGTT TGAGTCAAAA GCCTGGGTGA 1683 

GAATCCTGCC TCTCCCCAAA CATTAATCAC CAAAGTATTA ATGTACAGAG TGGCCCCTCA 1743 

CCTGGGCCTT TCCTGTGCCA ACCTGATGCC CCTTCCCCAA GAAGGTGAGT GCTTGTCATG 1803 

GAAAATGTCC TGTGGTGACA GGCCCAGTGG AACAGTCACC CTTCTGGGCA AGGGGGAACA 1863 

AATCACACCT CTGGGCTTCA GGGTATCCCA GACCCCTCTC AACACCCGCC CCCCCCATGT 1923 

TTAAACTTTG TGCCTTTGAC CATCTCTTAG GTCTAATGAT ATTTTATGCA AACAGTTCTT 19 83 

GGACCCCTGA ATTCTTCAAT GACAGGGATG CCAACACCTT CTTGGCTTCT GGGACCTGTG 2043 

TTCTTGCTGA GCACCCTCTC CGGTTTGGGT TGGGATAACA GAGGCAGGAG TGGCAGCTGT 2103 

CCCCTCTCCC TGGGGATATG CAACCCTTAG AGATTGCCCC AGAGCCCCAC TCCCGGCCAG 2163 

GCGGGAGATG GACCCCTCCC TTGCTCAGTG CCTCCTGGCC GGGGCCCCTC ACCCCAAGGG 2223 

GTCTGTATAT ACATTTCATA AGGCCTGCCC TCCCATGTTG CATGCCTATG TACTCTGCGC 2283 

CAAAGTGCAG CCCTTCCTCC TGAAGCCTCT GCCCTGCCTC CCTTTCTGGG AGGGCGGGGT 2343 
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GGGGGTGACT 


GAATTTGGGC 


CTCTTGTACA 


GTTAACTCTC 


CCAGGTGGAT 


TTTGTGGAGG 


2403 


TGAGAAAAGG 


GGCATTGAGA 


CTATAAAGCA 


GTAGACAATC 


CCCACATACC 


ATCTGTAGAG 


2463 


TTGGAACTGC 


ATTCTTTTAA 


AGTTTTATAT 


GCATATATTT 


TAGGGCTGCT 


AGACTTACTT 


2523 


TCCTATTTTC 


TTTTCCATTG 


CTTATTCTTG 


AGCACAAAAT 


GATAATCAAT 


TATTACATTT 


... 2583 


ATACATCACC 


TTTTTGACTT 


TTCCAAGCCC 


TTTTACAGCT 


CTTGGCATTT 


TCCTCGCCTA 


2643 


GGCCTGTGAG 


GTAACTGGGA 


TCGCACCTTT 


TATACCAGAG 


ACCTGAGGCA 


GATGAAATTT 


2703 


ATTTCCATCT 


AGGACTAGAA 


AAACTTGGGT 


CTCTTACCGC 


GAGACTGAGA 


GGCAGAAGTC 


2763 


AGCCCGAATG 


CCTGTCAGTT 


TCATGGAGGG 


GAAACGCAAA 


ACCTGCAGTT 


CCTGAGTACC 


2823 


TTCTACAGGC 


CCGGCCCAGC 


CTAGGCCCGG 


GGTGGCCACA 


CCACAGCAAG 


CCGGCCCCCC 


2883 


CTCTTTTGGC 


CTTGTGGATA 


AGGGAGAGTT 


GACCGTTTTC 


ATCCTGGCCT 


CCTTTTGCTG 


2943 


TTTGGATGTT 


TCCACGGGTC 


TCACTTATAC 


CAAAGGGAAA 


ACTCTTCATT 


AAAGTCCCGT 


3003 


ATTTCTTCTA 


AAAAAAAAAA 


AAAAAAAA 








3031 



(2) INFORMATION FOR SEQ ID NO: 4: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 474 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Asn Lys Leu Ser Gly Gly Gly Gly Arg Arg Thr Arg Val Glu Gly 
1 5 10 15 

Gly Gin Leu Gly Gly Glu Glu Trp Thr Arg His Gly Ser Phe Val Asn 
20 25 30 

Lys Pro Thr Arg Gly Trp Leu His Pro Asn Asp Lys Val Met Gly Pro 
35 40 45 

Gly Val Ser Tyr Leu Val Arg Tyr Met Gly Cys Val Glu Val Leu Gin 
50 55 60 

Ser Met Arg Ala Leu Asp Phe Asn Thr Arg Thr Gin Val Thr Arg Glu 
65 70 75 80 

Ala lie Ser Leu Val Cys Glu Ala Val Pro Gly Ala Lys Gly Ala Thr 
85 90 95 

Arg Arg Arg Lys Pro Cys Ser Arg Pro Leu Ser Ser lie Leu Gly Arg 
100 105 110 

Ser Asn Leu Lys Phe Ala Gly Met Pro lie Thr Leu Thr Val Ser Thr 
115 120 125 

Ser Ser Leu Asn Leu Met Ala Ala Asp Cys Lys Gin lie lie Ala Asn 
130 135 140 

His His Met Gin Ser lie Ser Phe Ala Ser Gly Gly Asp Pro Asp Thr 
145 150 155 160 
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Ala Glu Tyr Val Ala Tyr Val Ala Lys Asp Pro Val Asn Gin Arg Ala 
165 170 175 

Cys His lie Leu Glu Cys Pro Glu Gly Leu Ala Gin Asp Val lie Ser 
180 185 190 

Thr lie Gly Gin Ala Phe Glu Leu Arg Phe Lys Gin Tyr Leu Arg Asn 
195 200 205 

Pro Pro Lys Leu Val Thr Pro His Asp Arg Met Ala Gly Phe Asp Gly 
210 215 220 

Ser Ala Trp Asp Glu Glu Glu Glu Glu Pro Pro Asp His Gin Tyr Tyr 
225 230 235 ' 240 

Asn Asp Phe Pro Gly Lys Glu Pro Pro Leu Gly Gly Val Val Asp Met 
245 250 255 

Arg Leu Arg Glu Gly Ala Ala Pro Gly Ala Ala Arg Pro Thr Ala Pro 
260 265 270 

Asn Ala Gin Thr Pro Ser His Leu Gly Ala Thr Leu Pro Val Gly Gin 
275 280 285 

Pro Val Gly Gly Asp Pro Glu Val Arg Lys Gin Met Pro Pro Pro Pro 
290 295 300 

Pro Cys Pro Gly Arg Glu Leu Phe Asp Asp Pro Ser Tyr Val Asn Val 
305 310 315 320 

Gin Asn Leu Asp Lys Ala Arg Gin Ala Val Gly Gly Ala Gly Pro Pro 
325 330 335 

Asn Pro Ala lie Asn Gly Ser Ala Pro Arg Asp Leu Phe Asp Met Lys 
340 345 350 

Pro Phe Glu Asp Ala Leu Arg Val Pro Pro Pro Pro Gin Ser Val Ser 
355 360 365 

Met Ala Glu Gin Leu Arg Gly Glu Pro Trp Phe His Gly Lys Leu Ser 
370 375 380 

Arg Arg Glu Ala Glu Ala Leu Leu Gin Leu Asn Gly Asp Phe Leu Val 
385 390 395 400 

Arg Glu Ser Thr Thr Thr Pro Gly Gin Tyr Val Leu Thr Gly Leu Gin 
405 410 415 

Ser Gly Gin Pro Lys His Leu Leu Leu Val Asp Pro Glu Gly Val Val 
420 425 430 

Arg Thr Lys Asp His Arg Phe Glu Ser Val Ser His Leu lie Ser Tyr 
435 440 445 

His Met Asp Asn His Leu Pro lie He Ser Ala Gly Ser Glu Leu Cys 
450 455 460 

Leu Gin Gin Pro Val Glu Arg Lys Leu * 
465 470 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1109 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: mRNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: GRB2 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 79.. 732 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GCCAGTGAAT TCGGGGGCTC AGCCCTCCTC CCTCCCTTCC CCCTGCTTCA GGCTGCTGAG 60 

CACTGAGCAG CGCTCAGA ATG GAA GCC ATC GCC AAA TAT GAC TTC AAA GCT 111 

Met Glu Ala He Ala Lys Tyr Asp Phe Lys Ala 

1 5 -10 

ACT GCA GAC GAC GAG CTG AGC TTC AAA AGG GGG GAC ATC CTC AAG GTT 159 
Thr Ala Asp Asp Glu Leu Ser Phe Lys Arg Gly Asp He Leu Lys Val 
15 20 25 

TTG AAC GAA GAA TGT GAT CAG AAC TGG TAC AAG GCA GAG CTT AAT GGA 207 
Leu Asn Glu Glu Cys Asp Gin Asn Trp Tyr Lys Ala Glu Leu Asn Gly 
30 35 40 

AAA GAC GGC TTC ATT CCC AAG AAC TAC ATA GAA ATG AAA CCA CAT CCG 255 
Lys Asp Gly Phe He Pro Lys Asn Tyr He Glu Met Lys Pro His Pro 
45 50 55 

TGG TTT TTT GGC AAA ATC CCC AGA GCC AAG GCA GAA GAA ATG CTT AGC 303 
Trp Phe Phe Gly Lys He Pro Arg Ala Lys Ala Glu Glu Met Leu Ser 
60 65 70 75 

AAA CAG CGG CAC GAT GGG GCC TTT CTT ATC CGA GAG AGT GAG AGC GCT 3 51 

Lys Gin Arg His Asp Gly Ala Phe Leu He Arg Glu Ser Glu Ser Ala 
80 85 90 

CCT GGG GAC TTC TCC CTC TCT GTC AAG TTT GGA AAC GAT GTG CAG CAC 399 
Pro Gly Asp Phe Ser Leu Ser Val Lys Phe Gly Asn Asp Val Gin His 
95 100 105 

TTC AAG GTG CTC CGA GAT GGA GCC GGG AAG TAC TTC CTC TGG GTG GTG 447 
Phe Lys Val Leu Arg Asp Gly Ala Gly Lys Tyr Phe Leu Trp Val Val 
110 115 120 

AAG TTC AAT TCT TTG AAT GAG CTG GTG GAT TAT CAC AGA TCT ACA TCT 495 
Lys Phe Asn Ser Leu Asn Glu Leu Val Asp Tyr His Arg Ser Thr Ser 
125 130 135 

GTC TCC AGA AAC CAG CAG ATA TTC CTG CGG GAC ATA GAA CAG GTG CCA 543 
Val Ser Arg Asn Gin Gin He Phe Leu Arg Asp He Glu Gin Val Pro 
140 145 150 155 

CAG CAG CCG ACA TAC GTC CAG GCC CTC TTT GAC TTT GAT CCC CAG GAG 591 
Gin Gin Pro Thr Tyr Val Gin Ala Leu Phe Asp Phe Asp Pro Gin Glu 
160 165 170 

GAT GGA GAG CTG GGC TTC CGC CGG GGA GAT TTT ATC CAT GTC ATG GAT 639 
Asp Gly Glu Leu Gly Phe Arg Arg Gly Asp Phe He His Val Met Asp 
175 180 185 

AAC TCA GAC CCC AAC TGG TGG AAA GGA GCT TGC CAC GGG CAG ACC GGC 6 87 
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Asn Ser Asp Pro Asn Trp Trp Lys Gly Ala Cys His Gly Gin Thr Gly 
190 195 200 

ATG TTT CCC CGC AAT TAT GTC ACC CCC GTG AAC CGG AAC GTC TAA 732 
Met Phe Pro Arg Asn Tyr Val Thr Pro Val Asn Arg Asn Val * 
205 210 215 

GAGTCAAGAA GCAATTATTT AAAGAAAGTG AAAAATGTAA AACACATACA AAAGAATTAA 7 92 

ACCCACAAGC TGCCTCTGAC AGCAGCCTGT GAGGGAGTGC AGAACACCTG GCCGGGTCAC 852 

CCTGTGACCC TCTCACTTTG GTTGGAACTT TAGGGGGTGG GAGGGGGCGT TGGATTTAAA 912 

AATGCCAAAA CTTACCTATA AATTAAGAAG AGTTTTTATT ACAAATTTTC ACTGCTGCTC 972 

CTCTTTCCCC TCCTTTGTCT TTTTTTTCAT CCTTTTTTCT CTTCTGTCCA TCAGTGCATG 1032 

ACGTTTAAGG CCACGTATAG TCCTAGCTGA CGCCAATAAT AAAAAACAAG AAACCAAAAA 1092 

AAAAAAACCC GAATTCA 1109 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 218 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Glu Ala lie Ala Lys Tyr Asp Phe Lys Ala Thr Ala Asp Asp Glu 
15 10 15 

Leu Ser Phe Lys Arg Gly Asp lie Leu Lys Val Leu Asn Glu Glu Cys 
20 25 30 

Asp Gin Asn Trp Tyr Lys Ala Glu Leu Asn Gly Lys Asp Gly Phe lie 
35 40 45 

Pro Lys Asn Tyr lie Glu Met Lys Pro His Pro Trp Phe Phe Gly Lys 
50 55 60 

lie Pro Arg Ala Lys Ala Glu Glu Met Leu Ser Lys Gin Arg His Asp 
65 70 75 80 

Gly Ala Phe Leu lie Arg Glu Ser Glu Ser Ala Pro Gly Asp Phe Ser 
85 90 95 

Leu Ser Val Lys Phe Gly Asn Asp Val Gin His Phe Lys Val Leu Arg 
100 105 110 

Asp Gly Ala Gly Lys Tyr Phe Leu Trp Val Val Lys Phe Asn Ser Leu 
115 120 125 

Asn Glu Leu Val Asp Tyr His Arg Ser Thr Ser Val Ser Arg Asn Gin 
130 135 140 

Gin lie Phe Leu Arg Asp lie Glu Gin Val Pro Gin Gin Pro Thr Tyr 
145 150 155 160 

Val Gin Ala Leu Phe Asp Phe Asp Pro Gin Glu Asp Gly Glu Leu Gly 
165 170 175 
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Phe Arg Arg Gly Asp Phe lie His Val Met Asp Asn Ser Asp Pro Asn 
180 185 190 

Trp Trp Lys Gly Ala Cys His Gly Gin Thr Gly Met Phe Pro Arg Asn 
195 200 205 

Tyr Val Thr Pro Val Asn Arg Asn Val * 
210 215 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4870 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 
- (B) CLONE: hSHIP 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 113.. 3673 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

CCCAAGAGGC AACGGGCGGC AGGTTGCAGT GGAGGGGCCT CCGCTCCCCT CGGTGGTGTG 60 

TGGGTCCTGG GGGTGCCTGC CGGCCCAGCC GAGGAGGCCC ACGCCCACCA TG GTC 115 

Val 
1 

CCC TGC TGG AAC CAT GGC AAC ATC ACC CGC TCC AAG GCG GAG GAG CTG 163 
Pro Cys Trp Asn His Gly Asn lie Thr Arg Ser Lys Ala Glu Glu Leu 
5 10 15 

CTT TGC AGG ACA GGC AAG GAC GGG AGC TTC CTC GTG CGT GCC AGC GAG 211 
Leu Cys Arg Thr Gly Lys Asp Gly Ser Phe Leu Val Arg Ala Ser Glu 
20 25 30 

TCC ATC TTC CGG GCA TAC GCG CTC TGC GTG CTG TAT CGG AAT TGC GTT 259 
Ser lie Phe Arg Ala Tyr Ala Leu Cys Val Leu Tyr Arg Asn Cys Val 
35 40 45 

TAT ACT TAC AGA ATT CTG CCC AAT GAA GAT GAT AAA TTC ACT GTT CAG 307 
Tyr Thr Tyr Arg lie Leu Pro Asn Glu Asp Asp Lys Phe Thr Val Gin 
50 55 60 65 

GCA TCC GAA GGC GTC TCC ATG AGG TTC TTC ACC AAG CTG GAC CAG CTC 355 
Ala Ser Glu Gly Val Ser Met Arg Phe Phe Thr Lys Leu Asp Gin Leu 
70 75 80 

ATC GAG TTT TAC AAG AAG GAA AAC ATG GGG CTG GTG ACC CAT CTG CAA 403 
lie Glu Phe Tyr Lys Lys Glu Asn Met Gly Leu Val Thr His Leu Gin 
85 90 95 

TAC CCT GTG CCG CTG GAG GAA GAG GAC ACA GGC GAC GAC CCT GAG GAG 451 
Tyr Pro Val Pro Leu Glu Glu Glu Asp Thr Gly Asp Asp Pro Glu Glu 
100 105 110 
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GAC ACA GAA AGT GTC GTG TCT CCA CCC GAG CTG CCC CCA AGA AAC ATC 499 
Asp Thr Glu Ser Val Val Ser Pro Pro Glu Leu Pro Pro Arg Asn lie 
115 120 125 

CCG CTG ACT GCC AGC TCC TGT GAG GCC AAG GAG GTT CCT TTT TCA AAC 547 
Pro Leu Thr Ala Ser Ser Cys Glu Ala Lys Glu Val Pro Phe Ser Asn 
130 135 140 145 

GAG AAT CCC CGA GCG ACC GAG ACC AGC CGG CCG AGC CTC TCC GAG ACA 595 
Glu Asn Pro Arg Ala Thr Glu Thr Ser Arg Pro Ser Leu Ser Glu Thr 
150 155 160 

TTG TTC CAG CGA CTG CAA AGC ATG GAC ACC AGT GGG CTT CCA GAA GAG 643 
Leu Phe Gin Arg Leu Gin Ser Met Asp Thr Ser Gly Leu Pro Glu Glu 
165 170 175 

CAT CTT AAG GCC ATC CAA GAT TAT TTA AGC ACT CAG CTC GCC CAG GAC 691 
His Leu Lys Ala He Gin Asp Tyr Leu Ser Thr Gin Leu Ala Gin Asp 
180 185 190 

TCT GAA TTT GTG AAG ACA GGG TCC AGC AGT CTT CCT CAC CTG AAG AAA 739 
Ser Glu Phe Val Lys Thr Gly Ser Ser Ser Leu Pro His Leu Lys Lys 
195 200 205 

CTG ACC ACA CTG CTC TGC AAG GAG CTC TAT GGA GAA GTC ATC CGG ACC 7 87 

Leu Thr Thr Leu Leu Cys Lys Glu Leu Tyr Gly Glu Val He Arg Thr 
210 215 220 225 

CTC CCA TCC CTG GAG TCT CTG CAG AGG TTA TTT GAC CAG CAG CTC TCC 83 5 

Leu Pro Ser Leu Glu Ser Leu Gin Arg Leu Phe Asp Gin Gin Leu Ser 
230 235 240 

CCG GGC CTC CGT CCA CGT CCT CAG GTT CCT GGT GAG GCC AAT CCC ATC 883 
Pro Gly Leu Arg Pro Arg Pro Gin Val Pro Gly Glu Ala Asn Pro He 
245 250 255 

AAC ATG GTG TCC AAG CTC AGC CAA CTG ACA AGC CTG TTG TCA TCC ATT 931 
Asn Met Val Ser Lys Leu Ser Gin Leu Thr Ser Leu Leu Ser Ser He 
260 265 270 

GAA GAC AAG GTC AAG GCC TTG CTG CAC GAG GGT CCT GAG TCT CCG CAC 97 9 

Glu Asp Lys Val Lys Ala Leu Leu His Glu Gly Pro Glu Ser Pro His 
275 280 285 

CGG CCC TCC CTT ATC CCT CCA GTC ACC TTT GAG GTG AAG GCA GAG TCT 1027 
Arg Pro Ser Leu He Pro Pro Val Thr Phe Glu Val Lys Ala Glu Ser 
290 295 300 305 

CTG GGG ATT CCT CAG AAA ATG CAG CTC AAA GTC GAC GTT GAG TCT GGG 1075 
Leu Gly He Pro Gin Lys Met Gin Leu Lys Val Asp Val Glu Ser Gly 
310 315 320 

AAA CTG ATC ATT AAG AAG TCC AAG GAT GGT TCT GAG GAC AAG TTC TAC 1123 
Lys Leu He He Lys Lys Ser Lys Asp Gly Ser Glu Asp Lys Phe Tyr 
325 330 335 

AGC CAC AAG AAA ATC CTG CAG CTC ATT AAG TCA CAG AAA TTT CTG AAT 1171 
Ser His Lys Lys He Leu Gin Leu He Lys Ser Gin Lys Phe Leu Asn 
340 345 350 

AAG TTG GTG ATC TTG GTG GAA ACA GAG AAG GAG AAG ATC CTG CGG AAG 1219 
Lys Leu Val He Leu Val Glu Thr Glu Lys Glu Lys He Leu Arg Lys 
355 360 365 

GAA TAT GTT TTT GCT GAC TCC AAA AAG AGA GAA GGC TTC TGC CAG CTC 12 67 

Glu Tyr Val Phe Ala Asp Ser Lys Lys Arg Glu Gly Phe Cys Gin Leu 
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370 375 380 385 

CTG CAG CAG ATG AAG AAC AAG CAC TCA GAG CAG CCG GAG CCC GAC ATG 1315 
Leu Gin Gin Met Lys Asn Lys His Ser Glu Gin Pro Glu Pro Asp Met 
390 395 400 

ATC ACC ATC TTC ATC GGC ACC TGG AAC ATG GGT AAC -GCC CCC CCT CCC 1363 
lie Thr lie Phe lie Gly Thr Trp Asn Met Gly Asn Ala Pro Pro Pro 
405 410 415 

AAG AAG ATC ACG TCC TGG TTT CTC TCC AAG GGG CAG GGA AAG ACG CGG 1411 
Lys Lys lie Thr Ser Trp Phe Leu Ser Lys Gly Gin Gly Lys Thr Arg 
420 425 430 

GAC GAC TCT GCG GAC TAC ATC CCC CAT GAC ATT TAC GTG ATC GGC ACC 1459 
Asp Asp Ser Ala Asp Tyr lie Pro His Asp lie Tyr Val lie Gly Thr 
435 440 445 

CAA GAG GAC CCC CTG AGT GAG AAG GAG TGG CTG GAG ATC CTC AAA CAC 1507 
Gin Glu Asp Pro Leu Ser Glu Lys Glu Trp Leu Glu lie Leu Lys His 
450 455 460 465 

TCC CTG CAA GAA ATC ACC AGT GTG ACT TTT AAA ACA GTC GCC ATC CAC 1555 
Ser Leu Gin Glu lie Thr Ser Val Thr Phe Lys Thr Val Ala lie His 
470 475 480 

ACG CTC TGG AAC ATC CGC ATC GTG GTG CTG GCC AAG CCT GAG CAC GAG 1603 
Thr Leu Trp Asn lie Arg lie Val Val Leu Ala Lys Pro Glu His Glu 
485 490 495 

AAC CGG ATC AGC CAC ATC TGT ACT GAC AAC GTG AAG ACA GGC ATT GCA 1651 
Asn Arg lie Ser His lie Cys Thr Asp Asn Val Lys Thr Gly lie Ala 
500 505 510 

AAC ACA CTG GGG AAC AAG GGA GCC GTG GGG GTG TCG TTC ATG TTC AAT 1699 
Asn Thr Leu Gly Asn Lys Gly Ala Val Gly Val Ser Phe Met Phe Asn 
515 520 525 

GGA ACC TCC TTA GGG TTC GTC AAC AGC CAC TTG ACT TCA GGA AGT GAA 17 47 

Gly Thr Ser Leu Gly Phe Val Asn Ser His Leu Thr Ser Gly Ser Glu 
530 535 540 545 

AAG AAA CTC AGG CGA AAC CAA AAC TAT ATG AAC ATT CTC CGG TTC CTG 1795 
Lys Lys Leu Arg Arg Asn Gin Asn Tyr Met Asn lie Leu Arg Phe Leu 
550 555 560 

GCC CTG GGC GAC AAG AAG CTG AGT CCC TTT AAC ATC ACT CAC CGC TTC 184 3 

Ala Leu Gly Asp Lys Lys Leu Ser Pro Phe Asn lie Thr His Arg Phe 
565 570 575 

ACG CAC CTC TTC TGG TTT GGG GAT CTT AAC TAC CGT GTG GAT CTG CCT 1891 
Thr His Leu Phe Trp Phe Gly Asp Leu Asn Tyr Arg Val Asp Leu Pro 
580 585 590 

ACC TGG GAG GCA GAA ACC ATC ATC CAA AAA ATC AAG CAG CAG CAG TAC 1939 
Thr Trp Glu Ala Glu Thr He He Gin Lys He Lys Gin Gin Gin Tyr 
595 600 605 

GCA GAC CTC CTG TCC CAC GAC CAG CTG CTC ACA GAG AGG AGG GAG CAG 1987 
Ala Asp Leu Leu Ser His Asp Gin Leu Leu Thr Glu Arg Arg Glu Gin 
610 615 620 625 

AAG GTC TTC CTA CAC TTC GAG GAG GAA GAA ATC ACG TTT GCC CCA ACC 203 5 

Lys Val Phe Leu His Phe Glu Glu Glu Glu He Thr Phe Ala Pro Thr 
630 635 640 
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TAC CGT TTT GAG AGA CTG ACT CGG GAC AAA TAC GCC TAC ACC AAG CAG 2083 

Tyr Arg Phe Glu Arg Leu Thr Arg Asp Lys Tyr Ala Tyr Thr Lys Gin 
645 650 655 

AAA GCG ACA GGG ATG AAG TAC AAC TTG CCT TCC TGG TGT GAC CGA GTC 2131 
Lys Ala Thr Gly Met Lys Tyr Asn Leu Pro Ser Trp Cys Asp Arg Val 
660 665 670 

CTC TGG AAG TCT TAT CCC CTG GTG CAC GTG GTG TGT CAG TCT TAT GGC 2179 
Leu Trp Lys Ser Tyr Pro Leu Val His Val Val Cys Gin Ser Tyr Gly 
675 680 685 

AGT. ACC AGC GAC ATC ATG ACG AGT GAC CAC AGC CCT GTC TTT GCC ACA 2227 
Ser Thr Ser Asp lie Met Thr Ser Asp His Ser Pro Val Phe Ala Thr 
690 695 700 705 

TTT GAG GCA GGA GTC ACT TCC CAG TTT GTC TCC AAG AAC GGT CCC GGG 227 5 

Phe Glu Ala Gly Val Thr Ser Gin Phe Val Ser Lys Asn Gly Pro Gly 
710 715 720 

ACT GTT GAC AGC CAA GGA CAG ATT GAG TTT CTC AGG TGC TAT GCC ACA 2323 
Thr Val Asp Ser Gin Gly Gin lie Glu Phe Leu Arg Cys Tyr Ala Thr 
725 730 735 

TTG AAG ACC AAG TCC CAG ACC AAA TTC TAC CTG GAG TTC CAC TCG AGC 2371 
Leu Lys Thr Lys Ser Gin Thr Lys Phe Tyr Leu Glu Phe His Ser Ser 
740 745 750 

TGC TTG GAG AGT TTT GTC AAG AGT CAG GAA GGA GAA AAT GAA GAA GGA 2419 
Cys Leu Glu Ser Phe Val Lys Ser Gin Glu Gly Glu Asn Glu Glu Gly 
755 760 765 

AGT GAG GGG GAG CTG GTG GTG AAG TTT GGT GAG ACT CTT CCA AAG CTG 2467 
Ser Glu Gly Glu Leu Val Val Lys Phe Gly Glu Thr Leu Pro Lys Leu 
770 775 780 785 

AAG CCC ATT ATC TCT GAC CCT GAG TAC CTG CTA GAC CAG CAC ATC CTC 2515 
Lys Pro lie lie Ser Asp Pro Glu Tyr Leu Leu Asp Gin His lie Leu 
790 795 800 

ATC AGC ATC AAG TCC TCT GAC AGC GAC GAA TCC TAT GGC GAG GGC TGC 2563 
lie Ser lie Lys Ser Ser Asp Ser Asp Glu Ser Tyr Gly Glu Gly Cys 
805 810 815 

ATT GCC CTT CGG TTA GAG GCC ACA GAA ACG CAG CTG CCC ATC TAC ACG 2611 
lie Ala Leu Arg Leu Glu Ala Thr Glu Thr Gin Leu Pro lie Tyr Thr 
820 825 830 

CCT CTC ACC CAC CAT GGG GAG TTG ACA GGC CAC TTC CAG GGG GAG ATC 2659 
Pro Leu Thr His His Gly Glu Leu Thr Gly His Phe Gin Gly Glu lie 
835 840 845 

AAG CTG CAG ACC TCT CAG GGC AAG ACG AGG GAG AAG CTC TAT GAC TTT 27 07 

Lys Leu Gin Thr Ser Gin Gly Lys Thr Arg Glu Lys Leu Tyr Asp Phe 
850 855 860 865 

GTG AAG ACG GAG CGT GAT GAA TCC AGT GGG CCA AAG ACC CTG AAG AGC 2755 
Val Lys Thr Glu Arg Asp Glu Ser Ser Gly Pro Lys Thr Leu Lys Ser 
870 875 880 

CTC ACC AGC CAC GAC CCC ATG AAG CAG TGG GAA GTC ACT AGC AGG GCC 2803 
Leu Thr Ser His Asp Pro Met Lys Gin Trp Glu Val Thr Ser Arg Ala 
885 890 895 

CCT CCG TGC AGT GGC TCC AGC ATC ACT GAA ATC ATC AAC CCC AAC TAC 2 851 

Pro Pro Cys Ser Gly Ser Ser lie Thr Glu lie lie Asn Pro Asn Tyr 
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900 



905 



910 



ATG GGA GTG GGG CCC TTT GGG CCA CCA ATG CCC CTG CAC GTG AAG CAG 
Met Gly Val Gly Pro Phe Gly Pro Pro Met Pro Leu His Val Lys Gin 
915 920 925 



2899 



ACC TTG TCC CCT GAC CAG CAG CCC ACA GCC TGG AGC TAC GAC CAG CCG 2947 

Thr Leu Ser Pro Asp Gin Gin Pro Thr Ala Trp Ser Tyr Asp Gin Pro 

930 935 940 ' 945 

CCC AAG GAC TCC CCG CTG GGG CCC TGC AGG GGA GAA AGT CCT CCG ACA 2995 

Pro Lys Asp Ser Pro Leu Gly Pro Cys Arg Gly Glu Ser Pro Pro Thr 

950 955 960 



CCT CCC GGC CAG CCG CCC ATA TCA CCC AAG AAG TTT TTA CCC TCA ACA 
Pro Pro Gly Gin Pro Pro lie Ser Pro Lys Lys Phe Leu Pro Ser Thr 
965 970 975 



3043 



GCA AAC CGG GGT CTC CCT CCC AGG ACA CAG GAG TCA AGG CCC AGT GAC 
Ala Asn Arg Gly Leu Pro Pro Arg Thr Gin Glu Ser Arg Pro Ser Asp 
980 985 990 



3091 



CTG GGG AAG AAC GCA GGG GAC ACG CTG CCT CAG GAG GAC CTG CCG CTG 
Leu Gly Lys Asn Ala Gly Asp Thr Leu Pro Gin Glu Asp Leu Pro Leu 
995 1000 1005 



3139 



ACG AAG CCC GAG ATG TTT GAG AAC CCC CTG TAT GGG TCC CTG AGT TCC 
Thr Lys Pro Glu Met Phe Glu Asn Pro Leu Tyr Gly Ser Leu Ser Ser 
1010 1015 1020 1025 



3187 



TTC CCT AAG CCT GCT CCC AGG AAG GAC CAG GAA TCC CCC AAA ATG CCG 
Phe Pro Lys Pro Ala Pro Arg Lys Asp Gin Glu Ser Pro Lys Met Pro 
1030 1035 1040 



3235 



CGG AAG GAA CCC CCG CCC TGC CCG GAA CCC GGC ATC TTG TCG CCC AGC 
Arg Lys Glu Pro Pro Pro Cys Pro Glu Pro Gly lie Leu Ser Pro Ser 
1045 1050 1055 



3283 



ATC GTG CTC ACC AAA GCC CAG GAG GCT GAT CGC GGC GAG GGG CCC GGC 
lie Val Leu Thr Lys Ala Gin Glu Ala Asp Arg Gly Glu Gly Pro Gly 
1060 1065 1070 



3331 



AAG CAG GTG CCC GCG CCC CGG CTG CGC TCC TTC ACG TGC TCA TCC TCT 
Lys Gin Val Pro Ala Pro Arg Leu Arg Ser Phe Thr Cys Ser Ser Ser 
1075 1080 1085 



3379 



GCC GAG GGC AGG GCG GCC GGC GGG GAC AAG AGC CAA GGG AAG CCC AAG 
Ala Glu Gly Arg Ala Ala Gly Gly Asp Lys Ser Gin Gly Lys Pro Lys 
1090 1095 1100 1105 



3427 



ACC CCG GTC AGC TCC CAG GCC CCG GTG CCG GCC AAG AGG CCC ATC AAG 
Thr Pro Val Ser Ser Gin Ala Pro Val Pro Ala Lys Arg Pro lie Lys 
1110 1115 1120 



3475 



CCT TCC AGA TCG GAA ATC AAC CAG CAG ACC CCG CCC ACC CCG ACG CCG 
Pro Ser Arg Ser Glu lie Asn Gin Gin Thr Pro Pro Thr Pro Thr Pro 
1125 1130 1135 



3523 



CGG CCG CCG CTG CCA GTC AAG AGC CCG GCG GTG CTG CAC CTC CAG CAC 
Arg Pro Pro Leu Pro Val Lys Ser Pro Ala Val Leu. His Leu Gin His 
1140 1145 1150 



3571 



TCC AAG GGC CGC GAC TAC CGC GAC AAC ACC GAG CTC CCG CAT CAC GGC 
Ser Lys Gly Arg Asp Tyr Arg Asp Asn Thr Glu Leu Pro His His Gly 
1155 1160 1165 



3619 
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AAG CAC CGG CCG GAG GAG GGG CCA CCA GGG CCT CTA GGC AGG ACT GCC 3667 
Lys His Arg Pro Glu Glu Gly" Pro Pro Gly Pro Leu Gly Arg Thr Ala 
1170 1175 1180 1185 

ATG CAG TGAAGCCCTC AGTGAGCTGC CACTGAGTCG GGAGCCCAGA GGAACGGCGT 3723 
Met Gin 



GAAGCCACTG 


GACCCTCTCC 


CGGGACCTCC 


TGCTGGCTCC 


TCCTGCCCAG 


CTTCCTATGC 


3783 


AAGGCTTTGT 


GTTTTCAGGA 


AAGGGCCTAG 


CTTCTGTGTG 


GCCCACAGAG 


TTCACTGCCT 


3843 


GTGAGGCTTA 


GCACCAAGTG 


CTGAGGCTGG 


AAGAAAAACG 


CACACCAGAC 


GGGCAACAAA 


3903 


CAGTCTGGGT 


CCCCAGCTCG 


CTCTTGGTAC 


TTGGGACCCC 


AGTGCCTCGT 


TGAGGGCGCC 


3963 


ATTCTGAAGA 


AAGGAACTGC 


AGCGCCGATT 


TGAGGGTGGA 


GATATAGATA 


ATAATAATAT 


4023 


TAATAATAAT 


AATGGCCACA 


TGGATCGAAC 


ACTCATGATG 


TGCCAAGTGC 


TGTGCTAAGT 


4083 


GCTTTACGAA 


CATTCGTCAT 


ATCAGGATGA 


CCTCGAGAGC 


TGAGGCTCTA 


GCCACCTAAA 


4143 


ACACGTGCCC 


AAACCCACCA 


GTTTAAAACG 


GTGTGTGTTC 


GGAGGGGTGA 


AAGCATTAAG 


4203 


AAGCCCAGTG 


CCCTCCTGGA 


GTGAGACAAG 


GGCTCGGCCT 


TAAGGAGCTG 


AAGAGTCTGG 


4263 


GTAGCTTGTT 


TAGGGTACAA 


GAAGCCTGTT 


CTGTCCAGCT 


TCAGTGACAC 


AAGCTGCTTT 


4323 


AGCTAAAGTC 


CCGCGGGTTC 


CGGCATGGCT 


AGGCTGAGAG 


CAGGGATCTA 


CCTGGCTTCT 


4383 


CAGTTCTTTG 


GTTGGAAGGA 


GCAGGAAATC 


AGCTCCTATT 


CTCCAGTGGA 


GAGATCTGGC 


4443 


CTCAGCTTGG 


GC TAG AG ATG 


CCAAGGCCTG 


TGCCAGGTTC 


CCTGTGCCCT 


CCTCGAGGTG 


4503 


GGCAGCCATC 


ACCAGCCACA 


GTTAAGCCAA 


GCCCCCCAAC 


ATGTATTCCA 


TCGTGCTGGT 


4563 


AGAAGAGTCT 


TTGCTGTTGC 


TCCCGAAAGC 


CGTGCTCTCC 


AGCCTGGCTG 


CCAGGGAGGG 


4623 


TGGGCCTCTT 


GGTTCCAGGC 


TCTTGAAATA 


GTGCAGCCTT 


TTCTTCCTAT 


CTCTGTGGCT 


4683 


TTCAGCTCTG 


CTTCCTTGGT 


TATTAGGAGA 


ATAGATGGGT 


GATGTCTTTC 


CTTATGTTGC 


4743 


TTTTTCAACA 


TAGCAGAATT 


AATGTAGGGA 


GCTAAATCCA 


GTGGTGTGTG 


TGAATGCAGA 


4803 


AGGGAATGCA 


CCCCACATTC 


CCATGATGGA 


AGTCTGCGTA 


ACCAATAAAT 


TGTGCCTTTC 


4863 



TTAAAAA 4870 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1187 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Val Pro Cys Trp Asn His Gly Asn lie Thr Arg Ser Lys Ala Glu Glu 
1 5 10 15 

Leu Leu Cys Arg Thr Gly Lys Asp Gly Ser Phe Leu Val Arg Ala Ser 
20 25 30 
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Glu Ser He Phe Arg Ala Tyr Ala Leu Cys Val Leu Tyr Arg Asn Cys 
35 40 45 

Val Tyr Thr Tyr Arg He Leu Pro Asn Glu Asp Asp Lys Phe Thr Val 
50 55 60 

Gin Ala Ser Glu Gly Val Ser Met Arg Phe Phe Thr Lys Leu Asp Gin 
65 70 75 80 

Leu He Glu Phe Tyr Lys Lys Glu Asn Met Gly Leu Val Thr His Leu 
85 90 95 

Gin Tyr Pro Val Pro Leu Glu Glu Glu Asp Thr Gly Asp Asp Pro Glu 
100 105 110 

Glu Asp Thr Glu Ser Val Val Ser Pro Pro Glu Leu Pro Pro Arg Asn 
115 120 125 

He Pro Leu Thr Ala Ser Ser Cys Glu Ala Lys Glu Val Pro Phe Ser 
130 135 140 

Asn Glu Asn Pro Arg Ala Thr Glu Thr Ser Arg Pro Ser Leu Ser Glu 
145 150 155 160 

Thr Leu Phe Gin Arg Leu Gin Ser Met Asp Thr Ser Gly Leu Pro Glu 
165 170 175 

Glu His Leu Lys Ala He Gin Asp Tyr Leu Ser Thr Gin Leu Ala Gin 
180 .185 190 

Asp Ser Glu Phe Val Lys Thr Gly Ser Ser Ser Leu Pro His Leu Lys 
195 200 205 

Lys Leu Thr Thr Leu Leu Cys Lys Glu Leu Tyr Gly Glu Val He Arg 
210 215 220 

Thr Leu Pro Ser Leu Glu Ser Leu Gin Arg Leu Phe Asp Gin Gin Leu 
225 230 235 240 

Ser Pro Gly Leu Arg Pro Arg Pro Gin Val Pro Gly Glu Ala Asn Pro 
245 250 255 

He Asn Met Val Ser Lys Leu Ser Gin Leu Thr Ser Leu Leu Ser Ser 
260 265 270 

He Glu Asp Lys Val Lys Ala Leu Leu His Glu Gly Pro Glu Ser Pro 
275 280 285 

His Arg Pro Ser Leu He Pro Pro Val Thr Phe Glu Val Lys Ala Glu 
290 295 300 

Ser Leu Gly He Pro Gin Lys Met Gin Leu Lys Val Asp Val Glu Ser 
305 310 315 320 

Gly Lys Leu He He Lys Lys Ser Lys Asp Gly Ser Glu Asp Lys Phe 
325 330 335 

Tyr Ser His Lys Lys He Leu Gin Leu He Lys Ser Gin Lys Phe Leu 
340 345 350 

Asn Lys Leu Val lie Leu Val Glu Thr Glu Lys Glu Lys He Leu Arg 
355 360 365 

Lys Glu Tyr Val Phe Ala Asp Ser Lys Lys Arg Glu Gly Phe Cys Gin 
370 375 380 
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Leu Leu Gin Gin Met 
385 



Lys Asn Lys His Ser Glu Gin Pro Glu Pro 
390 395 



Asp 
400. 



Met lie Thr lie Phe lie Gly Thr Trp Asn Met Gly Asn Ala Pro Pro 
405 410 415 

Pro Lys Lys lie Thr Ser Trp Phe Leu Ser Lys Gly Gin Gly Lys Thr 
420 425 430 

Arg Asp Asp Ser Ala Asp Tyr lie Pro His Asp lie Tyr Val lie Gly 
435 440 445 

Thr Gin Glu Asp Pro Leu Ser Glu Lys Glu Trp Leu Glu lie Leu Lys 
450 455 460 

His Ser Leu Gin Glu lie Thr Ser Val Thr Phe Lys Thr Val Ala lie 
465 470 475 480 

His Thr Leu Trp Asn lie Arg lie Val Val Leu Ala Lys Pro Glu His 
485 490 495 

Glu Asn Arg He Ser His He Cys Thr Asp Asn Val Lys Thr Gly He 
500 505 510 

Ala Asn Thr Leu Gly Asn Lys Gly Ala Val Gly Val Ser Phe Met Phe 
515 520 525 

Asn Gly Thr Ser Leu Gly Phe Val Asn Ser His Leu Thr Ser Gly Ser 
530 535 540 

Glu Lys Lys Leu Arg Arg Asn Gin Asn Tyr Met Asn He Leu Arg Phe 
545 550 555 560 

Leu Ala Leu Gly Asp Lys Lys Leu Ser Pro Phe Asn He Thr His Arg 
565 570 575 

Phe Thr His Leu Phe Trp Phe Gly Asp Leu Asn Tyr Arg Val Asp Leu 
580 585 590 

Pro Thr Trp Glu Ala Glu Thr He He Gin Lys He Lys Gin Gin Gin 
595 600 605 

Tyr Ala Asp Leu Leu Ser His Asp Gin Leu Leu Thr Glu Arg Arg Glu 
610 615 620 

Gin Lys Val Phe Leu His Phe Glu Glu Glu Glu He Thr Phe Ala Pro 
625 630 635 640 

Thr Tyr Arg Phe Glu Arg Leu Thr Arg Asp Lys Tyr Ala Tyr Thr Lys 
645 . 650 655 

Gin Lys Ala Thr Gly Met Lys Tyr Asn Leu Pro Ser Trp Cys Asp Arg 
660 665 670 

Val Leu Trp Lys Ser Tyr Pro Leu Val His Val Val Cys Gin Ser Tyr 
675 680 685 

Gly Ser Thr Ser Asp He Met Thr Ser Asp His Ser Pro Val Phe Ala 
690 695 700 

Thr Phe Glu Ala Gly Val Thr Ser Gin Phe Val Ser Lys Asn Gly Pro 
705 710 715 720 

Gly Thr Val Asp Ser Gin Gly Gin He Glu Phe Leu Arg Cys Tyr Ala 



725 



730 



735 
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Thr Leu Lys Thr Lys Ser Gin Thr Lys Phe Tyr Leu Glu Phe His Ser 
740 745 750 

Ser Cys Leu Glu Ser Phe Val Lys Ser Gin Glu Gly Glu Asn Glu Glu 
755 760 765 

Gly Ser Glu Gly Glu Leu Val Val Lys Phe Gly Glu Thr Leu Pro Lvs 
770 .775 780 

Leu Lys Pro lie lie Ser Asp Pro Glu Tyr Leu Leu Asp Gin His He 
785 790 795 ~ 800 

Leu He Ser He Lys Ser Ser Asp Ser Asp Glu Ser Tyr Gly Glu Gly 
805 810 815 

Cys He Ala Leu Arg Leu Glu Ala Thr Glu Thr Gin Leu Pro He Tyr 
820 825 830 

Thr Pro Leu Thr His His Gly Glu Leu Thr Gly His Phe Gin Gly Glu 
835 840 845 

He Lys Leu Gin Thr Ser Gin Gly Lys Thr Arg Glu Lys Leu Tyr Asp 
850 855 860 

Phe Val Lys Thr Glu Arg Asp Glu Ser Ser Gly Pro Lys Thr Leu Lys 
865 870 875 880 

Ser Leu Thr Ser His Asp Pro Met Lys Gin Trp Glu Val Thr Ser Arg 
885 890 895 

Ala Pro Pro Cys Ser Gly Ser Ser He Thr Glu He He Asn Pro Asn 
900 905 910 

Tyr Met Gly Val Gly Pro Phe Gly Pro Pro Met Pro Leu His Val Lys 
915 920 925 

Gin Thr Leu Ser Pro Asp Gin Gin Pro Thr Ala Trp Ser Tyr Asp Gin 
930 935 940 

Pro Pro Lys Asp Ser Pro Leu Gly Pro Cys Arg Gly Glu Ser Pro Pro 
945 950 955 " 960 

Thr Pro Pro Gly Gin Pro Pro He Ser Pro Lys Lys Phe Leu Pro Ser 
965 970 975 

Thr Ala Asn Arg Gly Leu Pro Pro Arg Thr Gin Glu Ser Arg Pro Ser 
980 985 990 

Asp Leu Gly Lys Asn Ala Gly Asp Thr Leu Pro Gin Glu Asp Leu Pro 
995 1000 1005 

Leu Thr Lys Pro Glu Met Phe Glu Asn Pro Leu Tyr Gly Ser Leu Ser 
1010 1015 1020 

Ser Phe Pro Lys Pro Ala Pro Arg Lys Asp Gin Glu Ser Pro Lys Met 
1025 1030 1035 1040 

Pro Arg Lys Glu Pro Pro Pro Cys Pro Glu Pro Gly He Leu Ser Pro 
1045 1050 1055 

Ser He Val Leu Thr Lys Ala Gin Glu Ala Asp Arg Gly Glu Gly Pro 
1060 1065 1070 

Gly Lys Gin Val Pro Ala Pro Arg Leu Arg Ser Phe Thr Cys Ser Ser 
1075 1080 1085 
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Ser Ala Glu Gly Arg Ala Ala Gly Gly Asp Lys Ser Gin Gly Lys Pro 
1090 1095 1100 

Lys Thr Pro Val Ser Ser Gin Ala Pro Val Pro Ala Lys Arg Pro lie 
1105 1110 1115 " H20 

Lys Pro Ser Arg Ser Glu lie Asn'Gln Gin Thr Pro Pro Thr Pro Thr 
1125 1130 . H35 

Pro Arg Pro Pro Leu Pro Val Lys Ser Pro Ala Val Leu His Leu Gin 
1140 1145 . H50 

His Ser Lys Gly Arg Asp Tyr Arg Asp Asn Thr Glu Leu Pro His His 
1155 1160 1165 

Gly Lys His Arg Pro Glu Glu Gly Pro Pro Gly Pro Leu Gly Arg Thr 
1170 1175 1180 

Ala Met Gin 
1185 
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I CLAIM: 

1 . A purified and isolated nucleic acid molecule comprising a sequence encoding an SH2- 
containing inositol-phosphatase which has a src homology 2 (SH2) domain and exhibits 
phosphoIns-5-ptase activity. 

5 2. An SH2-containing inositol-phosphatase as claimed in claim 1 which is further 
characterized by having an amino terminal src homology 2 (SH2) domain, two 
phosphotyrosine binding (PTB) consensus sequences, a proline rich region, and motifs highly 
conserved among inositol polyphosphate-5-phosphatases (phospholns-5-ptases). 

3. A purified and isolated nucleic acid molecule as claimed in claim 1, comprising (i) a 
10 nucleic acid sequence encoding an SH2-containing inositol-phosphatase having the amino acid 
sequence as shown in SEQ ID NO:2 or Figure 2 (A); or, (ii) nucleic acid sequences complementary 

to (i). 



4. A purified and isolated nucleic acid molecule as claimed in claim 1, comprising (i) a 

nucleic acid sequence encoding an SH2-containing inositol-phosphatase having the amino acid 
15 sequence as shown in SEQ ID NO:8 or Figure 11; or, (ii) nucleic acid sequences complementary to 

(i). 



5. A purified and isolated nucleic acid molecule as claimed in claim 1, comprising (i) a 
nucleic acid sequence encoding an SH2-containing inositol-phosphatase having the nucleic acid 
sequence as shown in SEQ ID NO:l or Figure 3, wherein T can also be U; 

20 (ii) a nucleic acid sequence complementary to (i); or 

(iii) a nucleic acid molecule differing from any of the nucleic acids of (i) and (ii) in 
codon sequences due to the degeneracy of the genetic code. 

6. A purified and isolated nucleic acid molecule as claimed in claim 1, comprising (i) a 
nucleic acid sequence encoding an SH2-containing inositol-phosphatase having the nucleic acid 

25 sequence as shown in SEQ ID NO:7 or Figure 10, wherein T can also be U; 

(ii) a nucleic acid sequence complementary to (i); or 

(iii) a nucleic acid molecule differing from any of the nucleic acids of (i) and (ii) in 
codon sequences due to the degeneracy of the genetic code. 

30 7. A purified and isolated nucleic acid molecule comprising a sequence which hybridizes 
under high stringency conditions to the nucleic acid molecule as claimed in claim 5. 
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8. A purified and isolated nucleic acid molecule as claimed in claim 1, which is a double 
stranded nucleic acid molecule or RNA. 

9. A recombinant expression vector adapted for transformation of a host cell comprising a 
nucleic acid molecule as claimed in claim 1 and one or more transcription and translation 
elements operatively linked to the nucleic acid molecule. 

10. A host cell containing a recombinant expression vector as claimed in claim 9. 

11. A method for preparing an SH2-containing inositol-phosphatase comprising (a) 
transferring a recombinant expression vector as claimed in claim 9 into a host cell; (b) selecting 
transformed host cells from untransformed host cells; (c) culturing a selected transformed host 
cell under conditions which allow expression of the SH2-containing inositol-phosphatase; and 
(d) isolating the SH2-containing inositol-phosphatase. 

12. A purified and isolated SH2-containing inositol-phosphatase which associates with 
She and exhibits phosphoIns-5-ptase activity. 

13. A purified and isolated She protein as claimed in claim 12, which has the amino acid 
sequence as shown in SEQ ID NO:2 or Figure 2(A), or as shown in SEQ ID NO:8 or Figure 11. 

14. Antibodies having specificity against an epitope of the SH2-containing inositol- 
phosphatase as claimed in claim 13. 

15. A nucleotide probe comprising a sequence encoding at least 6 continuous amino acids 
from the SH2-containing inositol-phosphatase as shown in SEQ ID. NO. 2 or Figure 2(A), or 
as shown in SEQ ID. NO. 8 or Figure 11. 

16. A method for identifying a substance which is capable of binding to a purified and 
isolated SH2-containing inositol-phosphatase protein as claimed in claim 12, comprising 
reacting the protein with at least one substance which potentially can bind with the protein 
under conditions which permit the formation of complexes between the substance and the 
protein; and, assaying for complexes, for free substance, for non-complexed protein, or for 
activation of the protein. 

17. A method for assaying a medium for the presence of an agonist or antagonist of the 
interaction of a purified and isolated SH2-containing inositol-phosphatase protein as claimed 
in claim 12 and a substance which binds to the protein which comprises reacting the protein 
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with a substance which is capable of binding to the protein and a suspected agonist or 
antagonist substance, under conditions which permit the formation of complexes between the 
substance and the protein; and, assaying for complexes, for free substance, for non-complexed 
protein, or for activation of the protein. 

18. A method as claimed in claim 17, wherein the substance is She or a part thereof. 

19. A method for assaying for the affect of a substance on the phosphoIns-5-ptase activity 
of a SH2-containing inositol-phosphatase protein as claimed in claim 12 comprising reacting a 
substrate which is capable of being hydrolyzed by the protein to produce a hydrolysis product, 
with a substance which is suspected of affecting the phosphoIns-5-ptase activity of the 
protein, under conditions which permit the hydrolysis of the substrate; determining the 
amount of hydrolysis product; and, comparing the amount of product obtained with the amount 
obtained in the absence of the substance to determine the affect of the substance on the 
phosphoIns-5-ptase activity of the protein. 

20. A substance identified in accordance with the method of claim 16, 17, 18 or 19. 

21. A pharmaceutical composition comprising a substance identified in accordance with 
the method of claim 16. 
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FIGURE 1 
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FIGURE 2 

A 

1 MPAMWG[WT^GNni&k 
51 iNCVYTYRILPNEDDIQTVQASEGVPMRFFTKLDQLIDFYKKE^^GLVTHL I 
10 1 LQYPV PtEEEDAIDEAEEDTES VMSPPELPPRNIPMSAGPSEAKDLPLATE 
15 1 NPRAPEVTRLSLSETLFQRLQSMDTSGLPEEHLKAIQDYLSTQLLLDSDF 
20 1 LKTGSSNLPHLKKLMSLLCKELHGEVIRTLPSLESLQRLFDQQLSPGLRP 
25 1 RPQVPGEASPITMVAKLSQLTSLLSSffiDKVKSLLHEGSESTNRRSLIPP 
301 VTFEVKSESLGIPQKMHLKVDVESGKLIVKKSKDGSEDKFYSHKKILQLI 
351 KSQKHJ^IKLVILVFreKEKILJlKEYW 

40 1 PEPDMrnnGTWNMGNAPPPKKITSWFLSKGQGKTRDDS ADYIPHDIYV 
45 1 IGTQEDPLGEKEWLELLRHSLQEVTSMTFKTVAIHTLWNIRIVVLAKPEH 
501 ENRISHICTDNVKTGIANTLGNKGAVGVSFMFNGTSLGFVNSHLTSGSEK 
551 KLRRNO^^yM^^^.RFIALGDKKI^PFN^^HRFTHIJ 7 WX 
601 AEAIIQKIKQQQYSDLLAHDQLLLERKDQKVFLHFEEEEITFAPTYRFER 
651 LTRDKYAYTK OKATGMKYhOSWCDRVLW KSYPLVHVVCQSYGSTSniMT 
70 1 SDHSPVFATFEAGVTSQFVSKNGPGTVDSQGQ1EFLACYATLKTKSQTKF 
75 1 YLEFHSSCLESFVKSQEGENEEGSEGEVVRFGETLPKLKPIISDPEYLL 
801 DQHILISIKSSDSDESYGEGCIALRLETTEAQHPIYTPLTHHGEMTGHFR 
851 GEIKI.QTS(^KMREKLYDFViaERDESSGMKCLKNLTSHDPMRQWEPSGR 

90 1 VPACGVSSLNEM^gNYIGMGPFGQPLHGKSTLSPDQQLTAWS YDQLPKD 

• • • . • 

95 1 SSLGPGRGEGPPTPPSQPPLSPKKFSSSTTNRGPCPRVQEARPGDLGKVE 
1001 ALLQEDLIXTKPEMFENPLYGSVSSFPKLWRKEQF^PKMLRKEPPPCPD 
105 1 PGISSPS WLPKAQEVESVKGTSKQAPVPVLGPTPRIRSFTCSSS AEGRM 

• • • • • * 

1 101 TSGDKSQGKPKASASSQAPVPVKRPVKPSRSEMSQQTTPIPAPRPPLPVK 
1 151 SPAVLQLQHSKGRDYRDNTELPHHGKHRQEEGLLGRTAMQ 
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FIGURE 3 

>BASE COUNT 1014 a 1147 c 1054 g 825 1 
>ORlGIN 

> 1 ccctggtagg agcagcagag gcaatttctg agaggcaaca ggcggcaggt ctcagcctag 

> 61 agagggccct gaactacttt gctggagtgt ccgtcctggg agtggctgct gacccagtcc 

> 121 aggagaccca tocctocc at oa tccctQQQ tgpaaccatg gcaacatcac ccgctccaag 

> 181 gcagaggagc tactttccag agccggcaag gacgggagct tccttgtgcg tgccagcgag 

> 241 tccatccccc gggcctgcgc actctgcgtg etgttccgga attgtgttta cacttacagg 

> 301 attctgcoca atgaggacga taaattcact gttcaggcat ccgaaggtgt ccccatgagg 

> 361 ttcttcacga agctggacca gctcatcgac ttttacaaga aggaaaacat ggggctggtg 

> 421 acccacctgc agtaccccgt gcccctggag gaggaggatg ctattgatga ggctgaggag 

> 481 gacactgaaa gtgtcatgtc accacctgag ctgoctcoca gaaacattcc tatgtctgcc 

> 541 gggcccagcg aggccaagga ccttcctctt gcaacagaga acccccgagc ccctgaggtc 

> 601 acccggctga gtctctccga gacactgttt cagcgtctac agagcatgga taccagtggg 

> 661 cttcccgagg agcacctgaa agccatccag gattatctga gcactcagct octcctggat 

> 721 tccgactttt tgaaaacggg ctccagcaac ctccctcacc tgaagaagct gatgtcactg 

> 781 ctctgcaagg agctccatgg ggaagtcatc aggactctgc catccctgga gtctctgcag 

> 841 aggttgtttg accaacagct ctccccaggc cttcgcccac gacctcaggt gcccggagag 

> 901 gccagtccca tcaccatggt tgccaaactc agccaattga caagtctgct gtcttccatt 

> 961 gaagataagg tcaagtcctt gctgcacgag ggctcagaat ctaccaacag gcgttccctt 

> 1021 atccctccgg tcaoctttga ggtgaagtca gagtocctgg gcattcctca gaaaatgcat 

> 1061 ctcaaagtgg acgttgagtc tgggaaactg atcgttaaga agtccaagga tggttctgag 

> 1141 gacaagttct »rflgrr«rtt* aaaaatcctg «tg^?tta agfr^frflflfi g rngt aflflc 

> 1 201 aagttggtga ttttggtgga gacggagaag gagaaaatcc tgaggaagga atatgtffit 

> 1 261 gctgactcta agaaaagaga aggcttctgt caactcctgc agcagatgaa gaacaagcat 

> 1321 tcggagcagc cagagcctga catgatoacc atcttcattg gcacttggaa catgggtaat 

> 1381 gcaccccctc ccaagaagat cacgtoctgg tttctctoca aggggcaggg aaagacacgg 

> 1441 gacgactctg ctgactacat cccccatgac atctatgtga ttggcaccca ggaggatccc 

> 1501 cttggagaga aggagtggct ggagctadc aggcactccc tgcaagaagt caccagcatg 

> 1 561 acatttaaaa cagttgccat ccacacccte tggaacattc gcatagtggt gcttgccaag 

> 1 621 ccagagcatg agaatcggat cagccatatc tgcactgaca acgtgaagac aggcatcgcc 

> 1 681 aacaccctgg gaaacaaggg agcagtggga gtgtccttca tgttcaatgg aacctccttg 

> 1 741 gggttcgtca acagccactt gacttctgga agtgaaaaaa agctcaggag aaatcaaaac 

> 1 801 tatatgaaca tcctgcggtt cctggccctg ggagacaaga agctaagccc atttaacatc 

> 1861 acccaccgct to aoccac ct cttctggctt ggggatctca actaccgcgt ggagctgccc 

> 1 921 acttgggagg cagaggccat catccagaag atcaagcaac agcagtattc agaccttctg 

> 1 981 gcccacgacc aactgctcct ggagaggaag gaccagaagg tcttcctgca ctttgaggag 

> 2041 gaagagatca ccttcgococ cacctatcga tttgaaagac tgacccggga caagtatgca 

> 21 01 tacacgaagc agaaagcaac agggatgaag tacaacttgc cgtcctggtg cgaccgagtc 

> 2161 ctctggaagt cttacccgct ggtgcatgtg gtctgtcagt cctatggcag taccagtgac 

> 2221 atcatgacga gtgaccacag ccctgtcttt gccacgtttg aagcaggagt cacatctcaa 

> 2281 ttcgtctcca agaatggtcc tggcactgta gatagccaag ggcagatcga gtttcttgca 

> 2341 tgctacgcca cactgaagac caagtoccag actaagttct acttggagtt ccactcaagc 

> 2401 tgcttagaga gttttgtcaa gagtcaggaa ggagagaatg aagagggaag tgaaggagag 

> 2461 ctggtggtac ggtttggaga gactcttccc aagctaaagc ccattatctc tgaccccgag 

> 2521 tacttactgg accagcatat cctgatcagc cttaaatcct ctgacagtga cgagtcctat 

> 2581 ggtgaaggct gcattgccct tcgcttggag accacagagg ctcagcatcc tatctacacg 

> 2641 cctctcaccc accatgggga gatgactggc cacttcaggg gagagattaa gctgcagacc 

> 2701 tcccagggca agatgaggga gaagctctat gactttgtga agacagagcg ggatgaatcc 

> 2761 agtggaatga aatgcttgaa gaa cc tcacc agccatgacc ctatgaggca atgggagcct 

> 2821 tctggcaggg tccctgcatg tggtgtctcc agcctcaatg agatgatcaa tccaaactac 

> 2881 attggtatgg ggccttttgg acagcccctg calgggaaat caaccctgtc cccagatcag 

> 2941 caactca ca g cttggagtta tgaccagcta cocaaagact octccctggg goctgggagg 

> 3001 ggggagggtc ctccaacooc tcoctoocaa ccacctctgt cgccaaagaa gttttcatct 

> 3061 tccacaacca accgaggtcc ctgccc ca gg gtgcaagagg caagacctgg ggatetggga 

> 3121 aaggtggaag ctctgctcca ggaggacctg ctgctgacga agcccgagat gtttgagaac 

> 3181 ccactgtatg gatecgtgag ttccttccct aagctggtgc ccaggaaaga gcaggagtct 

> 3241 cccaagatgc tgcggaagga goocoogocc tgtccagacc caggaatctc atcacccagc 

> 3301 atcgtgctcc c caaag ccc a agaggtggag agtgtcaagg ggacaagcaa acaggcccct 

> 3361 gtgcctgtcc ttggocccac accccggatc cgctocttta cctgttcttc ttctgctgag 

> 3421 ggcagaatga ccagtgggga caagagccaa gggaagcoca aggcctcagc cagttcccaa 

> 3481 gccc ca gtgc cagtcaagag gcctgtcaag ccttccaggt cagaaatgag ccagcagaca 

> 3541 acarccatoc cagctocacg gocacccctg ocagtcaaga gtcctgctgt cctgcagctg 

> 3601 caacattcca aaggcagaga ctaocgtgac aacacagaac t ccccca cca tggcaagcac 

> 3661 cgccaagagg aggggctgct tggcaggact gccatgcagt gagctgctgg tgatcggagc 

> 3721 ctggaggaac agcacaaagc agaoctgcga octctctcag gatgoctctc tcaggatgcc 

> 3781 tcttggagga cctoctgcta gctcttcttg cctagcttca agtoccaggc tgtgtatttt 

> 3841 ttttcaggaa acggcctcac ttctctgtgg tocaagaagt gtgctgctgg ctgccacact 

> 3901 gtgcggcaga tgctaaagct g gat g aca M i cg ca cg c cat aca g acag ca gacagcggca 

> 3961 ctgggtctca gaa cttg gat t oc tggg oct Icttacagtc gocgttttaa agaaaggaac 
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FIGURE 5 



1 2 3 4 5 6 7 



7.5 kb- 




WO 97/12039 



PCT/CA96/00655 



7/27 




< 



(■uiiu/ioiud) e d-su| 

SUBSTITUTE SHEET (RULE 26) 



WO 97/12039 PCT/CA96/00655 

8/27 

FIGURE 6 CONT'D 
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FIGURE 7 



Gene Locus: SHCl gill34475: 1..473 

Organism HOMO SAPIENS (HUMAN) gill 34475: 1..473 

Sequence 473 aa 

1 mnklsggggr rtrveggqlg geewtrhgsf vnkptrgwlh pndkvmgpgv 

51 sylvrymgcv evlqsmrald fntrtqvtre aislvceavp gakgatrrrk 

101 pcsrplssil grsnlkfagm pitltvstss lnlmaadckq iianhhmqsi 

151 sfasggdpdt aeyvayvakd pvnqrachil ecpeglaqdv istigqafel 

201 rfkqylmpp klvtphdnna gfdgsawdee eeeppdhqyy ndfpgkeppl 



251 ggvvdmrlre gaapgaarpt apnaqtpshl gatlpvgqpv ggdpevrkqm 
301 PPPPPcpgre lfddpsyvnv qnldkarqav ggagppnpai ngsaprdlfd 
351 mkpfedalrv ppppqsvsma eqlrgepwfh gklsrreaea llqlngdflv 
401 restttpgqy vltglqsgqp khlllvdpeg wrtkdhrfe svshlisyhm 
451 dnhlpiisag selclqqpve rkl 
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FIGURE 8 

H.sapiens SHC mRNA. 
ACCESSION X68148 
♦FIELD* NID 
g36453 

KEYWORDS SHC protein. 
SOURCE human. 
ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa/Eumycota group; 
Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; 
Vertebrata; Gnathostomata; Osteichthyes; Sarcopterygii; Choanata; 
Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Archonta; Primates; 
Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 3031) 
AUTHORS Pelicci,P. 
TITLE Direct Submission 

JOURNAL Submitted (10-JUN-1992) to the EMBL/GenBank/DDBJ databases. P. 
Pelicci, Clinica Medica I, Policlinico Monteluce, Perugia 06100 
08854, ITALY 
REFERENCE 2 (bases 1 to 3031) 
AUTHORS Pelicci,G., Lanfrancone,L., Grignani,F., McGladeJ., Cavallo,F., 

Forni,G., Nicoletti,L, Grignani,F., PawsonJ. and Pelicci,P.G. 
TITLE A novel transforming protein (SHC) with an SH2 domain is implicated 

in mitogenic signal transduction 
JOURNAL Cell 70 (1), 93-104 (1992) 
MEDLINE 92323554 
FEATURES Location /Qualifiers 

source 1..3031 

/organism="Homo sapiens" 
CDS 82..1503 

/codon_start=l 
/product="SHC transforming protein" 
/db_xref="PID:g36454" 

/translation="MNKLSGGGGRRTRVEGGQLGGEEWTRHGSFVNKPTRGW 
LHPNDK 

VMGPGVSYLVRYMGCVEVLQSMRALDFNTRTQVTREAISLVCEAVPGAKGATR 
RRKPC 

SRPLSSILGRSNLKFAGMPITLTVSTSSLNLMAADCKQIIANHHMQSISFASGGDPD 
T 

AEYVAYVAKDPVNQRACHILECPEGLAQDVISTIGQAFELRFKQYLRNPPKLVTPH 
DR 

MAGFDGSAWDEEEEEPPDHQYYNDFPGKEPPLGGWDMRLREGAAPGAARPTAP 
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NAQT 

PSHLGATLPVGQPVGGDPEVRKQMPPPPPCPGRELFDDPSYVNVQNLDKARQAV 
GGAG 

PPNPAINGSAPRDLFDMKPFEDALRVPPPPQSVSMAEQLRGEPWFHGKLSRREAE 
ALL 

QLNGDFLVRESTTTPGQYVLTGLQSGQPKHLLLVDPEGVVRTKDHRFESVSHLISY 
HM 

DNHLPHSAGSELCLQQPVERKL" 
BASE COUNT 664 a 855 c 809 g 703 1 
ORIGIN 

1 gcggtaacct aagctggcag tggcgtgatc cggcaccaaa tcggcccgcg gtgcgtgcgg 
61 agactccatg aggccctgga catgaacaag ctgagtggag gcggcgggcg caggactcgg 
121 gtggaagggg gccagcttgg gggcgaggag tggacccgcc acgggagctt tgtcaataag 
181 cccacgcggg gctggctgca tcccaacgac aaagtcatgg gacccggggt ttcctacttg 
241 gttcggtaca tgggttgtgt ggaggtcctc cagtcaatgc gtgccctgga cttcaacacc 
301 cggactcagg tcaccaggga ggccatcagt ctggtgtgtg aggctgtgcc gggtgctaag 
361 ggggcgacaa ggaggagaaa gccctgtagc cgcccgctca gctctatcct ggggaggagt 
421 aacctgaaat ttgctggaat gccaatcact ctcaccgtct ccaccagcag cctcaacctc 
481 atggccgcag actgcaaaca gatcatcgcc aaccaccaca tgcaatctat ctcatttgca 
541 tccggcgggg atccggacac agccgagtat gtcgcctatg ttgccaaaga ccctgtgaat 
601 cagagagcct gccacattct ggagtgtccc gaagggcttg cccaggatgt catcagcacc 
661 attggccagg ccttcgagtt gcgcttcaaa caatacctca ggaacccacc caaactggtc 
721 acccctcatg acaggatggc tggctttgat ggctcagcat gggatgagga ggaggaagag 
781 ccacctgacc atcagtacta taatgacttc ccggggaagg aacccccctt ggggggggtg 
841 gtagacatga ggcttcggga aggagccgct ccaggggctg ctcgacccac tgcacccaat 
901 gcccagaccc ccagccactt gggagctaca ttgcctgtag gacagcctgt tgggggagat 
961 ccagaagtcc gcaaacagat gccacctcca ccaccctgtc caggcagaga gctttttgat 
1021 gatccctcct atgtcaacgt ccagaaccta gacaaggccc ggcaagcagt gggtggtgct 
1081 gggcccccca atcctgctat caatggcagt gcaccccggg acctgtttga catgaagccc 
1141 ttcgaagatg ctcttcgggt gcctccacct ccccagtcgg tgtccatggc tgagcagctc 
1201 cgaggggagc cctggttcca tgggaagctg agccggcggg aggctgaggc actgctgcag 
1261 ctcaatgggg acttcttggt acgggagagc acgaccacac ctggccagta tgtgctcact 
1321 ggcttgcaga gtgggcagcc taagcatttg ctactggtgg accctgaggg tgtggttcgg 
1381 actaaggatc accgctttga aagtgtcagt caccttatca gctaccacat ggacaatcac 
1441 ttgcccatca tctctgcggg cagcgaactg tgtctacagc aacctgtgga gcggaaactg 
1501 tgatctgccc tagcgctctc ttccagaaga tgccctccaa tcctttccac cctattccct 
1561 aactctcggg acctcgtttg ggagtgttct gtgggcttgg ccttgtgtca gagctgggag 
1621 tagcatggac tctgggtttc atatccagct gagtgagagg gtttgagtca aaagcctggg 
1681 tgagaatcct gcctctcccc aaacattaat caccaaagta ttaatgtaca gagtggcccc 
1741 tcacctgggc ctttcctgtg ccaacctgat gccccttccc caagaaggtg agtgcttgtc 
1801 atggaaaatg tcctgtggtg acaggcccag tggaacagtc acccttctgg gcaaggggga 
1861 acaaatcaca cctctgggct tcagggtatc ccagacccct ctcaacaccc gcccccccca 
1921 tgtttaaact ttgtgccttt gaccatctct taggtctaat gatattttat gcaaacagtt 
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1981 cttggacccc tgaattcttc aatgacaggg atgccaacac cttcttggct tctgggacct 
2041 gtgttcttgc tgagcaccct ctccggtttg ggttgggata acagaggcag gagtggcagc 
2101 tgtcccctct ccctggggat atgcaaccct tagagattgc cccagagccc cactcccggc 
2161 caggcgggag atggacccct cccttgctca gtgcctcctg gccggggccc ctcaccccaa 
2221 ggggtctgta tatacatttc ataaggcctg ccctcccatg ttgcatgcct atgtactctg 
2281 cgccaaagtg cagcccttcc tcctgaagcc tctgccctgc ctccctttct gggagggcgg 
2341 ggtgggggtg actgaatttg ggcctcttgt acagttaact ctcccaggtg gattttgtgg 
2401 aggtgagaaa aggggcattg agactataaa gcagtagaca atccccacat accatctgta 
2461 gagttggaac tgcattcttt taaagtttta tatgcatata ttttagggct gctagactta 
2521 ctttcctatt ttcttttcca ttgcttattc ttgagcacaa aatgataatc aattattaca 
2581 tttatacatc acctttttga cttttccaag cccttttaca gctcttggca ttttcctcgc 
2641 ctaggcctgt gaggtaactg ggatcgcacc ttttatacca gagacctgag gcagatgaaa 
2701 tttatttcca tctaggacta gaaaaacttg ggtctcttac cgcgagactg agaggcagaa 
2761 gtcagcccga atgcctgtca gtttcatgga ggggaaacgc aaaacctgca gttcctgagt 
2821 accttctaca ggcccggccc agcctaggcc cggggtggcc acaccacagc aagccggccc 
2881 cccctctttt ggccttgtgg ataagggaga gttgaccgtt ttcatcctgg cctccttttg 
2941 ctgtttggat gtttccacgg gtctcactta taccaaaggg aaaactcttc attaaagtcc 
3001 cgtatttctt ctaaaaaaaa aaaaaaaaaa a 

// 
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FEATURES 

source 



CDS 



BASE COUNT 
ORIGIN 

1 
61 
121 
181 
241 
301 
361 
421 
481 
541 
601 
661 
721 
781 
841 
901 
961 
1021 
1081 
// 



NCBI gi: 181975 

Location/Qualifiers 
1..1109 

/organisms* Homo sapiens" 

/ sequenced_rool = ■ cDNA to raRNA" 

/ tissue_type=" brains tern" 

/tissue_lib="gtll human brainstem library" 
79. .732 

/ gene= ■ EGFRBP-GRB2 ■ 
/note=«NCBI gi: 181976" 
/ codon_s tar t = 1 

/product* "epidermal growth factor receptor-binding protein 
GRB2" 

/ translation- " MEAIAKYDFKATADDELSFKRGDILKVLNEECIXJNWYKAELNGK 
DGFI PKNY IEMKPHPWFFGKI PRAKAEEMLSKQRHDGAFLIRESESAPGDFSLSVKFG 
NDVQHFKVLRpGAGKYFLWWKFNSIiNEL^ EQVPQQ PTY 

VQALFDFDPQETCELGFRRGDFIHVMDNSDPNWWKGACHGQTGMFPRNYVT PVNRNV " 
313 a 273 c 262 g 261 t 



gccagtgaat tcgggggctc agccctcctc cctcccttcc 
cactgagcag cgctcagaat ggaagccatc gccaaatatg 
gacgagctga gcttcaaaag gggggacatc ctcaaggttt 
aactggtaca aggcagagct taatggaaaa gacggcttca 
atgaaaccac atccgtggtt ttttggcaaa atccccagag 
agcaaacagc ggcacgatgg ggcctttctt atccgagaga 
ttctccctct ctgtcaagtt tggaaacgat gtgcagcact 
gccgggaagt acttcctctg ggtggtgaag ttcaattctt 
cacagatcta catctgtctc cagaaaccag cagatattcc 

ccacagcagc cgacatacgt ccaggccctc tttgactttg atccccagga ggatggagag 
ctgggcttcc gccggggaga ttttatccat gtcatggata actcagaccc caactggtgg 
aaaggagctt gccacgggca gaccggcatg tttccccgca 
cggaacgtct aagagtcaag aagcaattat ttaaagaaag 
caaaagaatt aaacccacaa gctgcctctg acagcagcct 
tggccgggtc accctgtgac cctctcactt tggttggaac 
gttggattta aaaatgccaa aacttaccta taaattaaga 
tcactgctgc tcctctttcc cctcctttgt cttttttttc 
catcagtgca tgacgtttaa ggccacgtat agtcctagct 
agaaaccaaa aaaaaaaaac ccgaattca 



ccctgcttca ggctgctgag 
acttcaaagc tactgcagac 
tgaacgaaga atgtgatcag 
ttcccaagaa ctacatagaa 
ccaaggcaga agaaatgctt 
gtgagagcgc tcctggggac 
tcaaggtgct ccgagatgga 
tgaatgagct ggtggattat 
tgcgggacat agaacaggtg 



attatgtcac ccccgtgaac 
tgaaaaatgt aaaacacata 
gtgagggagt gcagaacacc 
tttagggggt gggagggggc 
agagttttta ttacaaattt 
atcctttttt ctcttctgtc 
gacgccaata ataaaaaaca 
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hSHIP cDNA Seauencc 

5' LTNrrRAMqT.ATFni^ m |^ N fH?f|] 

1 OAATTCGCGG CCGCCTCCAC CCAAGAGG CA ACQQOCOQCA OQTTOCAOTQ 

51 gAGGGGCCTC CCCTteCCTC GG^g^GT GGqj^qQq gQWCCTC^ 

101 GGCCCAGCCO AQGAGGCCCA CGCCCACCA T CGTCCCCTOC TftftAAmir. START CODON 

151 GCAACATCAC CCGCTCCAAG GCGGAGGAGC TGCTTTGCAG GACAGGCAAG 

201 GACGGGAGCT TCCTCGTGCG TGCCAOCGAG TCCATCTTCC GGGCATACGC 

251 GCTCTGCGTG CTGTATCGGA ATTGCGTTTA TACTTACAGA ATTCTGCCCA 

301 ATGAAGATGA TAAATTCACT GTTCAGGCAT CCGAAGGCGT CTCCATGAGG 

351 TTCTTCACCA AGCTGGACCA GCTCATCQAG TTTTACAAGA AGGAAAACAT 

401 GGGGCTGGTG ACCCATCTGC AATACCCTGT GCCGCTGGAG GAAGAGGACA 

451 CAGGCGACGA CCCTGAGGAG GACACAGAAA GTGTCGTGTC TCCACCCGAG 

501 CTGCCCCCAA GAAACATCCC GCTGACTGCC AGCTCCTOTC AGGCCAAGGA 

551 GGTTCCTTTT TCAAACGAGA ATCCCCGAGC GACCGAGACC AGCCGGCCGA 

601 GCCTCTCCGA GACATTOTTC CAGCGACTGC AAAGCATGGA CACCAOTGGG 

651 CTTCCAGAAG AGCATCTTAA GGCCATCCAA GATTATTTAA GCACTCAGCT 

701 CGCCCACGAC TCTGAATTTG TGAAGACAGG GTCCAGCAGT CTTCCTCACC 

751 TGAAGAAACT GACCACACTG CTCTOCAAGC AGCTCTATGG AGAAGTCATC 

801 CGGACCCTCC CATCCCTGGA GTCTCTGCAG AGGTTATTTG ACCAGCAGCT 

851 CTCCCCGGGC CTCCGTCCAC GTCCTCAGGT TCCTGGTCAG CCCAATCCCA 

901 TCAACATGGT GTCCAAGCTC AGCCAACTGA GAAGCCTGTT GTCATCCATT 

951 GAAGACAAGG TCAAGGCCTT GCTGGACGAG GGTCCTGAGT CTCCGCACCG 

1001 GCCCTCCCTT ATCCCTCCAG TCACCTTTOA GGTGAAGGCA GAGTCTCTCG 

1051 GGATTCCTCA GAAAATGCAG CTCAAAGTCG ACGTTGAGTC TGGGAAACTG 

1101 ATCATTAAGA AGTCCAAGGA TGGTTCTGAG GACAAGTTCT ACAGCCACAA 

1151 GAAAATCCTG CAGCTCATTA AGTCACAGAA ATTTCTGAAT AAGTTGGTGA 

1201 TCTTGGTCGA AACAGAGAAG GAGAAGATCC TCCGGAAGGA ATATGTTTTT 

12 SI GCTGACTCCA AAAAGAGAGA AGGCTTCTGC CAGCTCCTGC AGCAGATGAA 

1301 GAACAAGCAC TCAGAGCAGC CGGAGCCCGA CATGATCACC ATC7TCATCG 

1351 GCACCTGGAA CATGGGTAAC GCCCCCCCTC CCAAGAAGAT CACGTCCTGG 

1401 TTTCTCTCCA AGGGGCAGGG AAAGACGCGG GACGACTCTG CGGACTACAT 

1451 CCCCCATGAC ATTTACGTGA TCGGCACCCA AOAGGACCCC CTGAGTGAGA 

1501 AGGAGTGGCT GGAGATCCTC AAACACTCCC TGCAAGAAAT CACCACTCTC 

1551 ACTTTTAAAA CAGTCGCCAT CCACACGCTC TGGAACATCC GCATCGTGGT 

1601 GCTGGCCAAG CCTGAGCACG AGAACCGGAT CAGCCACATC TGTACTGACA 

1651 ACGTGAAGAC AGGCATTGCA AACACACTGG GGAACAAGGG AGCCGTGGGG 

1701 GTGTCGTOCA TCTTCAATGG AACCTCCTTA OGGTTCGTCA ACAGCCACTT 

1751 GACTTCAGGA AGTGAAAAGA AACTCAGGCG AAACCAAAAC XATATGAACA 

1801 TTCTCCGGTT CCTGGCCCTG GGCGACAACA AGCTQAGTCC CTTTAACATC 

1B51 ACTCACCGCT TCACGCACCT CTTCTGGTTT GGGGATCTTA ACTACCGTGT 

1901 GGATCTGCCT ACCTGGGAGG CAGAAACCAT CATCCAAAAA ATCAAGCAGC 

1951 AGCAGTACGC AGACCTCCTG TCCCACCACC AGCTGCTGAC AGAGAGGAGG 

2001 GAGCAGAAGG TCTTCCTACA CTTCGAGGAG GAAGAAATCA CGTTTGCCCC 

2051 AACCTACCGT TTTGAGAGAC TGACTCGGGA CAAATACGCC TACACCAAGC 

2101 AGAAAGCGAC AGGGATGAAG TACAACTTGC CTTCCTGGTG TGACCGAGTC 

2151 CTCTGGAAGT CTTATCCCCT GGTGCACGTG GTGTGTCAGT CTTATGGCAG 

2201 TACCAGCGAC ATCATGACGA GTGACCACAG CCCTGTCTTT GCCACATTTG 

2251 AGGCAGGAGT CACTTCCCAG TTTGTCTCCA AGAACGGTCC CGGGACTGTT 

2301 GACAGCCAAG GACAGATTGA GTTTCTCAOG TGCTATGCCA CATTGAAGAC 

2351 CAAGTCCCAG ACCAAAOTCT ACCTGGAGTT CCACTCGAGC TGCTTGGAGA 

2401 GTTTTCTCAA GAGTCAGGAA GGAGAAAATG AAGAAGGAAG TGAGGQGGAG 

2451 CTGGTGGTGA AGTTT CG T G A GAGTCTTCCA AAGCTGAAGC CCATTATCTC 

2501 TGACCCTGAG TACCTGCTAG ACCAGCACAT CCTCATCAGC ATCAAGTCCT 

2551 CTGACAGCGA CGAATCCTAT GGCGAGGGCT GCATTGCCCT TCGGTTAGAC 

2601 GCCACAGAAA CGCAGCTOCC CATCTACACG CCTCTCACCC ACCATGGGGA 

2651 GTTGACAGGC CACTTCCAGG GGGAGATGAA GCTGCAGACC TCTCAGGGCA 

2701 AGACGAGGGA GAAGCTCTAT GACTTTGTGA AGACGGAGCG TGATGAATCC 

2751 AGTGGGCGAA AGACCCTGAA GAGCCTCACC AGOCACGACC CCATGAAGCA 

2B01 GTCCGAAGTC ACTAGCAGGG CCCCTCCGTO CAGTCCCTCC AGCATCACTC 

2B51 AAATCATCAA CCCCAACTAC ATGGGAGTGG CGCCCTTTGG GCCACCAATG 

2901 CCCCTGCACG TGAAGCAGAC CTTOTCCCCT GACCAGCAGC CCACAGCCTG 

2951 GAGCTACGAC CAGCCGCCCA AGGACTCCCC GCTGGGGCCC TGCAGOOGAG 

3001 AAAGTCCTCC GACACCTCCC GOCCAGCCGC CCATATCACC CAAGAAGTTT 
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3051 TTACCCTCAA CAGCAAACCQ OCCTCTCCCT CCGAGGACAC AGGACTCAAQ 

3101 GCCCAGTGAC CTOOGOAAGA ACGCAGCOQA CACGCTGCCT CAGGAGOACC 

3151 TGCCGCTGAC GAAGCCCGAG ATCTPTQAGA ACCCCCTOTA TGGGTCCCTG 

3201 AGTTCCTTCC CTAAGCCTCC TCCCAOGAAG GACCAGGAAT CCCCCAAAAT 

3251 GCCGCCGAAG GAACCCCCGC CCTCCCCGGA ACCCGGCATC TTGTCQCCCA 

3301 GCATCGTGCT CACCAAAGCC CAGGAGGCTG ATCGCGGCGA GGGGCCCGGC 

3351 AACCAGGTGC CCGCGCCCCG GCTCCGCTCC TTCACGTGCT CATCCTCTCC 

3401 CGAGGGCAGG GCGGCCGGCG GGGACAAGAC CCAAGGGAAG CCGAAGACCC 

3451 CGGTCAGCTC CCAGGCCCCG GTQCCCGOCA AOAGCCCCAT CAAGCCTTCC 

3501 AGATCGGAAA TCAACCAGCA CACCCCOCCC ACCCCQACGC CQCGGCCGCC 

3551 GCTCCCAGTC AAGAGCCCGG CGGTGCTCCA CCTCCAGCAC TCCAAGGGCC 

3601 GCGACTACCG CGACAACACC GACCTCCCGC ATCACGGCAA GCACCGGCCG 

3651- GAGGAGGGGC CACCAGGGCC TCTAGGCAOG ACTOCCATGC AGTGA AGCCC STOPCODON 

3701 TCAGTGAGCT GCCACTGAGT COQGAGCCCA CAGGAACGGC QTGAAnrgg 

3751 TGaKCCCTCT CggotoXCCT CCTOCTCCCT CCTLL^tX' JuSS^ffi 

3B01 GCAAQGCTTT ^IWlTlOAli UAAAOdBCW XfeCfTCTGTC totecCACAG 

3851 AtittfrgMft&J gfgrtaafeCT TAGfcAiMAAG TGCTGAGCtt GGAAGAAAA A 

3901 CGCACACCAG ktGGGCAkZX AXtjAMWfia SKtttAGCT CtidrtTTOGT 

3951 kCvrtetiAdC CCACT&KW k*t^Ml&t£ (^caW^tqXa gaaaggaact 

4001 GCAGCGCCGA TTTGAGQGTG GAGATATAQA TAATAATAAT ATTAATAATA 

4051 ATAATGGCCA CATGCATCGA ACACTCATGA TCTG CCAACT GCTGTOCTAA 

4101 GTOCTTTAOC AACATTCGTC ATATCAGGAT GACCTCCAGA GOTGAGCCTC 

4151 TAGCCACCTA AAACACQTOC CCAAACCCAC CAGOTTAAAA CGGTG T GTGT 

4201 TCGGAGGCCT GAAAGCATTA AGAAGCCCAG TGCCCTCCTC GAGTGAGACA 

4251 AGGGCTCGGC CTTAAGGAGC TGAAGAOTCT QfefflAGfctTG TTTAGGGTAC 

4301 AAGAAGCCTG TTCTGTCCAG CTTCAGTQAC XCAAGCTGCT TTAGCTAAAG 

4351 .TCCCGCGGCT TCCGGCATGG CTAGGCTGAG AGCAGGGATC TACCTGGCTT 

4401 CTGAGTTCTT TGGTTGGAAG GAGCAGGAAA TCAQCTCCTA TTCTCCAGTG 

4451 GXGAriATCfS SCfcfCASCW GGGWXBX5X MMJJ166CC TGTGCCAGCT 

4501 TCCCTGTGCC CICCTCCAcfe W<56gXflggX teACCA&ttA GACTTAAGCC 

4551 AAGCCCCCCA ACATGTATTC CATCGTGCTG 6TA^KXgXGT C rri W rU Tfr 

4601 GCTCCCGAAA GCCGTGCTCT CCAGCCTOGC TGCCAGGGAG GGTGGGCCTC 

4651 TTCCTTCCAG GCTCTTCAAA TAQTOCAGCC TTTTCTTCCT ATCTCT G TGG 

4701 CTTTCAQCTC TOCTTCCTTG GTTAfrrACSfeA GAA*A6ATGG GTGATGTCTT 

4751 frCCTTATGTT GCTTTTTCAA CATAGCAGAA TTAATGTAGG GAGCTAAATC 

4801 CAGTGGTGTC TGT<3AAfl5£A GAAGGGAAT6 gAGCCtA6AT TCCCATGATG 

4851 GAAgTCTGCG TAACCAATAA ATTGTCCfrW ¥6TTAAAAAT 'fcCCCGCCGC 

4901 crrcGA«KW XC6oggcc6c gaX TR " ***** 

5' UNTRANSLATED REGION £3625=4223 
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hSHIP Amino Acid Sequence 

1 MVPCWNHGNI TRSKAEELLC RTGKDGSFLV RASESIFRAY ALCVLYRNCV 

51 YTYRILPNED DKFTVQASEG VSMRFFTKLD QLIBFYKKKN MGLVTHLQYP 

101 VPLEEEDTGD DPEEDTESW SPPELPPKNI PLTASSCEAK EVPFSNHHPR 

151 ATETSRPSLS ETLFQRLQSM DTSGLPEEHL KAXQDYLSTQ LAQDSEFVKT 

201 GSSSLPHLXX LTTLLCKKLY CEVIRTLPSL ESLQKLFDQQ LSPGLRPRPQ 

251 VPGEANPINM VSKLSQLTSL LSSIEDKVKA LLHBGPESPH RPSLIPFVTF 

301 EVKAESLGIP QKMQLKVDVE SGKLIZRXSK DGSEDKFYSH KKILQLZKSQ 

351 KFLNKLVILV ETEXEKILRX EYVFADSKXR BGFCQLLQQM KNKHSEQPEP 

401 DMITIFIGTW NMGNAPPPKK ITSWPLSKGQ GKTRDDSADY IPHDIYVIGT 

451 QEDPLSEKEW LEILKHSLQE ITSVTPKTVA IHTLWNIRTV VLAKPEHENR 

501 ISHICTDNVK TCIANTLGNK GAVCVSFMFN GTSLGFVKSH LTSGSEKKLR 

551 RNONYMNILR FLALGDKKLS PFNITHRPTH LFWFGDLNYR VDLPTWEAET 

601 IIQKIKQQQY A0LL5HDQLL TERREQKVFL HPBEEBITFA PTYRFERLTR 

651 DKYAYTKQKA TGMKYNLPSW CDRVLWKSYP LVHWCQSYG STSDIKTSDH 

701 SPVFATFEAG VTSQFVSKNG PGTVDSQGQX EFLRCYATLK TKSQTKFYLE 

751 FHSSCLESFV KSQEGENEEG SEGELWKFG ETLPKUCPII SDPEYLLDQH 

801 ILISIKSSDS DESYGEGCIA LRLEATETQL PIYTPLTHHG KLTGHPQGEI 

851 KLQTSQGKTR EKLYDFVKTE RDESSCPKTL KSLTSHDPKK QWEVTSRAPP 

901 CSGSSITEII NPNYMCVGPP GPPMPLHVKQ TLSPDQQPTA WSYDQPPKDS 

951 PLGPCRGESP PTPPGQPPIS PKKFLPSTAN RGLPPRTQES RPSDLCKNAG 

1001 DTLPQEDLPL TKPEKFENPL YGSLSSFPKP APRKDQESPK MPRKEPPPCP 

1051 EPGILSPSIV LTXAQEADRG EGPGKQVPAP RLRSFTCSSS AEGRAAOGDK 

1101 SQGKPKTPVS SQAPVPAKRP IKPSRSE1NQ QTPPTPTPRP PLPVKSPAVL 

1151 HLQHSXGRDY RBNTELPHHG KHRPEEGPPG PLGRTAMQ 
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(Peptide) PASTA of: hahipcom.pep from: 1 to; 11A8 April 3, 1996 13:17 

TRANSLATE of: hshipconwcon check: 8429 from: 129 to; 3693. 
generated Bymbols 1 to: 1188. 



TO: 145com.pep Sequences: l Symbols: 1,303 word Si*e: 2 

Scoring matrix: GenRunData:f astapep.cmp 
Variable parof actor used 

Gap creation penalty: 12.0 Gap extension penalty: 4.0 



The beat scores are: initl initn opt. 

/gcg/u8ers/patty/l45com.pep TRANSLATE of : 145com. con che. . .4283 4937 5189 
hshipcom.pep 

/gcg /users /pat ty/14 5 com . pep 

TRANSLATE of: 145com.con check: 4605 from: 130 to: 4040 
generated symbols 1 to: 1303. 



SCORES initl: 4283 Initn: 4937 Opt: 5189 

87.2% identity in 1194 aa overlap 

. 10 20 30 40 50 

hshipc MVPCWNHGNITRSKA5ELX£RTGKIXjSFLTOA^ 

III MMMM MMMMMMMM M MMM Ihllllhlllllllllll 
145com MPAOTPGWNHGNITRSKAEELLSRAGKDGSFLVRAS^ 

10 20 30 40 50 60 

60 70 80 90 100 i 110 

hshipc NEDDKFTVQASSGVSNRFFTKLDQLIEFYKKENMC^ 

M M M MM M M MMMM M M :| I II M MMMMMM M MM MMMM 
145COm NEDDRFTVQASBGVPMRFPTKLDQL IDFYKKBNMGLVTHLQYPVFLEEBDAIDEAEIHDTB 
70 80 90 100 110 120 

120 130 140 150 160 170 

SWS PPBLP PRNI PLTASS CEAKBVFFSNBNPRATBTSRPSLSETLFQRLQSMDTSGLPE 

I hill I Mill I I::|:::|lh:|:::|||lh|::| 1 I I I I II I I I I I I I I I I I 1 I 
SVMSPPBLPPRNIPMSAGPSRATOLPIiATENPRM 

"130 140 150 160 170 180 

180 190 200 210 220 230 

hshipc EHLKMQDYLSTQIAQDSEFVKTO^ 

MM Ml MINI IhhllMhUIIIII MMMMMMI MM IM MM I 

145com BHLKAXQDYLSTQLLLDSDFLXTCSSNLPHL!^^ 

190 200 210 220 230 240 

240 250 260 270 280 290 

hshipc lXX}I^PGLRPRPQvTCBANPINMVSiaSQLTSU»SSlBD 

lllllllllllllllll|:|hlhlMlllllllllliri|:|||||:.lh:hll||| 
14 5 com DCWl^PGLRPRPQVPGEASPITWAJO^ 

250 260 270 280 290 300 

300 310 320 330 340 350 

hshipc VTFETVKAESI^IPQKMQIJCVDVESGKXIIKXSKMSE 
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IMIIi:||||i|[|t;|||||t|||l!:||i|!|]|MI|||||||||||||tllllll 
14 5COm VTFffVXSBSLGIPQKMHMCm^ 

310 320 330 340 350 360 

360 370 380 390 400 410 

hBhipc ILVBTEKEKILIUCEYVPADSKXRBGPCQ^ 

1 1 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 II I i 1 1 I 1 1 1 1 II 1 1 1 1 1 1 1 1 I M I I | M 1 1 I 

145com ILVETEKEKILRKEYVFADSKiaiEGPCQUiQQM^ 

370 380 390 400 410 420 

420 430 440 450 460 470 

hshipc PKKITSWFI£KGQGKTRDDSADY^ 

lllllllMlllMlllllllllMMIlllHMIlMlllMhMllllMl MM I 

145cora P10Cri5WFIiSKGQGKrSU>DS^ 

430 440 450 460 470 480 

480 490 500 510 520 530 

h9hipc TVAXHTXMflftlVVIiAKP^^ 

MINI I llllllil Mill I III MINI IMIMIIMMMIitlllMIIillll I 

145com TVAIHTWOTRIVVIJUCPBHBNRIS^ 

490 500 510 520 530 540 

540 550 560 570 580 590 

hshipc NSHLTS GS EKKLRRNQNYMN ILRFLALGDKKLS PFNITHRTTHLIT^GDIjNTRVDLPTWK 

1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 f L 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ^ I E 1 1 1 1 1 ^ 1 1 1 1 1 

145com NSHLTSGSEKKLRRNQNYMNIIJ^^ 

550 560 570 SBO 590 600 

600 610 620 630 640 650 

hshipc AETI IQKIKQQQYADLLSHDOLLTERREQKVFLH FEB BE ITFAPTYRPERLTRDKYAYTK 

IM II M MIIIMII M M II I ll-IMIIIMMMIMIMIMIIIIIMIIII 

145com AEAI IQKIKQQQYSDLLAHDQIiIiIiERKDQKVFLHPEEBEITPAPTTO 

610 620 630 640 650 660 

660 670 680 690 700 710 

hshipc OKATGMKYNLPS WCDRVIAWS YPLVHVVCQS YGS TS D IMTSDHS PVPATFEAGVTSQFVS 

llllllllllii I Mill llllilllllll IIIII1IMIIIIIIIIIIIIIM! Mill 

14 Scorn QKATMKYI^SWCDRVLWICSYPLVHVVCQSYGSTSDI^ 

670 680 690 700 710 720 

720 730 740 750 760 770 

hshipc KNGPGTVDSQGQIEPLRCTfATIJCTKS 

M I M I I I ! I I 1 1 1 I t I 1 1 I I 1 1 ! 1 1 1 1 1 1 f I 1 I I t 1 I t 1 1 I I I t 1 1 I I I t I I V 1 1 1 t I 
14 5 com KJIGPGTVDSC^OIBFLACYATLKTTCSC^ 

730 740 750 760 770 780 

780 790 800 810 820 830 

hshipc KF6BTLPKLKPI ISDPE YLLDQHILISIXSSDSDESYGEGCIALW.BATBTQLPIYTPLT 

: 1 1 1 1 1 1 M 1 1 M M 1 1 M M I ill t i tl M i I i 1 1 ! 1 1 M 1 1 M 1 1 : 1 h I 1 1 1 H 1 1 

14 Scorn RFGBTLPJOXPIISDPEYIJLDC^ 

790 800 610 620 830 840 

640 850 860 870 880 890 

hshipc HHGBLTGHPQGEIKLQTSQGKTRBXLYD 

I II MIMMII IMMM M IIIIMM I IIIIMM I 1 1 . 1 1 1 1 1 1 h i 1 1 -I 

14 Scorn HHGEMTGHFRGBIJGjQTSQGKMRBKLYDFWCTERDE 

850 860 870 880 890 900 

900 910 920 930 940 950 
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:|>h ll"Mlllhl«llll> III I "1111111 1111111 Nihil! II 

145com VPACGVSSLNEMINFWYTWGPFGQ- - PLHGKTLSPDOQLTAWSYDQLPKDSSLGPGRG 
910 920 930 940 950 

960 970 980 • 990 1000 1010 

hshipc BSPPTPPGQPPXSPKKFLPSTANRGLPPRTQESRPSDLGK^AGDTLPQBDLPLTXPBMPE 

hllllhilhlllll «lhlll IhlhlhMII ::| llll I 1 1 1 1 I I 1 

14 5 com EGPPTPPSOPPLSPKKFSSSTTKRGPCPRVQBARPGDLGK- -VBALLQ2DLLLTKPBMPE 
960 970 980 990 1000 1010 

1020 1030 1040 1050 1060 1070 

hflhipc NPLYGSLSSFPKPAPR1GDQBSPKMPRKBPPPCPE 

IMIIhlllll :|lhllllll milllhlll llllll'lllh". «|>t|h 
145Cora NPLYGSVSSFPKLVPRKEQESPKMIJOT^^ 

1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 1130 

hshipc PAPRLRSPTCSSSAEGRAAGGDKSQGKPlCrPVSSOAPVPAKRPIKPSRSElNQQ 

hlhllllllllllll ::||llllllh::|Mlllhllhlllllh:|| 
14 5 com I^PVWPTPRII^FTCSSSABGR^GDKSQGKPKM 

1080 1090 1100 1110 1120 1130 

1140 1150 1160 1170 1180 

hshi pc TPPTPTPRPPLPVKSPAVXJttXJHSKGRDYTOOT 

hhhllllllllllllhlllllllllllllllllllllhll I IMIIII 

14 Scorn TTP I PAPRPPLPVKS PAVLQLQHS KGRD YRDNTELPHHGKHRQBB - - -GLLGRTAMQXAA 
1140 11S0 1160 1170 1180 1190 

14 SCOm GDRSLEBQHKADLRPLSGCLSQDASWRTS CXLPL PS PKSQAVYFFS GNGIjTSLWSKKCAA 
1200 1210 1220 1230 1240 1250 
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FIGURE 13 

(Nucleotide) PASTA of.- hshipcom. con from: 20 tO: 4896 April 3. 1996 13:08 



TO: 145com.con Sequences; 1 Symbol b: 4,04 0 Word Size: 6 

Scoring matrix: GenRunDataxf astadna.cmp 
Constant pamfactor used 

Gap creation penalty; 12.0 Gap extension penalty: 4.0 

The best scores are? initi initn opt. . 

/gcg/users/patty/145com.con 8658 10037 10667 

hshipcom.con 

/gcg/uaers/patty/l45com. con 



SCORES Initl: B658 Inltnx 10037 Opt: 10667 

81.6V identity in 4019 bp overlap 

20 30 40 50 

hshipc CCCAAGAGGCAAOGGGCGGCAGGTTGCAG- -TGG 

llitlllll llllllllll III I I 

145COQI CCXTTGCTAGGAGCAGCAGAGGCAATTTCTCA^ 

10 20 30 40 50 60 

60 70 80 90 100 110 

hshipc AGGGGCCTCCGCTC - CCCTCGGTGGTGTGTCKXritrTGGGGGro 

mi i i i i in mini 1 1 i 1 1 f 1 1 in mi i miii i 

14 Scorn AGAGGGCCCTGAACTACTTTOCTGGAGTGTCCGTC 

70 80 90 100 110 120 

120 130 140 150 160 170 

hshipc A/MAGGCCXIACGCCGACCATGGTCCCCT 

Mill IMI IN llllllllll I MIIIIIIIIIIIMIMIIIIIMMIII 

145com AGGAGACCCATGCCTGCCATGGTCCCTGGGTGG^ 

130 140 150 160 170 180 

180 190 200 210 220 230 

hshipc occxraxauc xnuL^ ^ 

|| i 1 1 1 1 1 1 1 IMI III I ItlMtlliillllllllll lllllllllllllll 

14 5 com GOteAGGAGCTACrrTCCAGAGCCGGCA^ 

190 200 210 220 230 240 

240 250 260 270 280 290 

hshipc TCCATCTTCCGGGCATACGCGCTCTGCGTGCTtTTA^ 

HUM 1 1 1 1 1 1 I Ml 1 1 1 1 1 1 1 K 1 1 1 1 1 II III MINIM mini! 

145COTU TCOVTCCCCCGGGCCKKXK1ACTCTGCGTC 

250 260 270 280 290 300 

300 310 320 330 340 350 

hshipc ATTCTGCCCAATGAAGATGATAAATTCACTGT^ 

IIIIIIMIIMM II MIIIIMMMIIMMMIIIIMIII III 1 1 1 1 1 1 1 1 

145com ATTCTGCCCAATGAGGAJCX1ATAAATTCACTGTTC 

310 320 330 340 350 360 
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3«0 370 380 390 400 410 

hshipc TTCrrCACOUGCTOGACCAGCTCATC^^ 

ilium iiiHiiiiiiiiiiiiiir.iiiiiiiiiiiiiiiiiiiiiniiiiiii 

145com TTOTCACOAAGCTGGACCACCTCATCGACTI^^ 

•3T0 380 390 400 410 420 

430 430 440 4S0 460 470 

hahipc ACCCATCTGCAATACCrTGT^cdCTGGAGCyUtaUMAC^^ 

inn inn imi inn ilium inn i u n iiiiimi 

145com ACCCyiCCTG RAGTAG TCCGTCCCCCTGGACGACGAGGATGCTATTGAT^^ 

430 440 450 460 470 460 

480 490 500 510 520 530 

hshipc ^^^^^ 

145com GACACTGAAAGTOTCMOTCACCMCTtSAGCTGC^ 

490 500 510 530 530 540 

540 550 560 570 580 S90 

hahipc AGCTCCTGTGAGGCCAAGG^GGTTCCIUTIU'CAAACGAGAATCCCCGAGCGACCGAGACC 

i ii i iiiiiiiiiii inn ii in imi iiiiimi i in i 

145com GGGCCCAGCGAGGCCAAGGACCTTCCTCTTGCAACAC^^ 

S50 560 570 S80 590 600 

600 610 620 630 640 650 

hshipc AGCCGGCCXy^jCCTCTCCGAOACATTGT^ 

I Mill III Hllllllllll itll Mill II II IIIIIMI Itlllllll 

14 Scorn ACCCGGCTCAGTCTCTCOSAQACACIX^ 

610 620 630 640 650 660 

660 670 660 690 700 710 

hahipc CTTCCAGAAGAGCATCTTAAGGCCATCCA^ 

IMM II Mill II II IIIIIMI llllll I IMMMMMI II Ml 

145com CTTCCCCUUKSAGCACCTGAAAGCC^^ 

670 680 690 700 710 720 

720 730 740 750 760 770 

hshipc TCTGAATTT(nt3AAGACAGGGTC 

M M III M M II II MIMM II 1 1 1 1 1 1 1 ! 1 1 1 M ! Mil Mill 

145com TCCGACTTITO^JJUICGGGC^ 

730 740 750 760 770 780 

/BO 790 800 810 820 830 

hahipc ctctccaaggagctctatggagaagtcatcc^ 

MIMIMIMMM IMI IMMMM MM II IMMMMMI Ml HMD 

14 5 com CTtWXJiAGGAGCTCCATG^ 

790 800 810 B20 830 640 

840 850 860 870 880 890 

hshipc aggttatttgaccagcagctctm 

Ml Mill Ml M MMMMMI IMM II Mill IMIMM II II III 

145cora AGGTTt^TnSACO^CAOC^ 

650 860 870 880 890 900 

900 910 920 930 940 950 

hshipc GCCAATCCCATCAACATGGTOTCCAAGCT 

MM II 1 1 1 1 II llllll MM IMMMM I ! 1 ! 1 1 1 III MM MMII 

145COm GCCAGTCCttTCACCATCXr)^ 

910 920 930 940 950 960 
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*60 §70 960 990 1000 10X0 

hshipc GAJUlfrCAAGGTCAACGC Cnn^ 

Hill MINIMI IMMMMMMIM I || III | M III MMM 

14 5com GAAGATMGGTCAAGTCCTTGCTC 

970 9B0 990 1000* 1010 1020 

1020 1030 1040 1050 1060 1070 

lillllll MIMMIMIIIIIIM 1 1 1 1 1 1 1 Mill illllllllillllill 

145COTO ATCCCTCCGGTCJU:CTIT3A0GTGAAGT^^ 

1030 1040 1050 1060 1070 1060 

1080 1090 1100 1110 1120 1130 

hshipc CTCAAAGTCGACGTTGAGTCTG^ 

lillllll 111111111111111111111111 llllllllllllllllllllllflll 

145com CTCAAAGTGGACGTTOAGTCTGGGAAACTGATO 

1090 1100 1110 1120 1130 1140 

1140 1150 1160 1170 1180 1190 

hBhipc GACAAGTTCTACAGCCACAAGAAAATCCTGC^ 

llllllllllllllllllll I II 1 1 II IC 11 !! I II 1 1 1 1 1 1 1 1 1 1 1 HIM || 

145com GACAACTTTCTACAGCCACAAAAA^ 

1150 1160 1170 1180 1190 1200 

1200 1210 1220 1230 1240 1250 

hshipc AAGTTGGTGATCTTGGTGGAA^ 

illllllllll lillllll II lllllllllll Mllll Illllllllillllill 
14 Scorn *teTTGGTGATTriX3GrJX3^^ 

1210 1220 1230 1240 1250 1260 

1260 1270 1280 1290 1300 1310 

hBhipc GCTGACTCCAAAAAGfcGAG^ 

lillllll II II lllllllllltill II llllllllllllllllllllllllll 

145com GCTGACTCTiOGAAAAQAQAAGGCn^ 

1270 1280 1290 1300 1310 1320 

1320 1330 1340 1350 1360 1370 

hshipc TCAGAGCAGCCGGAGCCCGACATC^ 

II lillllll Hill llllllllllllllllllll Mill IIIIIIIIIMIII 
145COm TC^KjAGCAKCAGAGCCTGACATQATCACCA^ 

1330 1340 1350 1360 1370 1380 

1380 1390 1400 1410 1420 1430 

hshipc GCCCCCCCTCCCAAGAM»TCAQOTCCT 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 ! 1 1 1 III 

14 Scorn CCaceCCCTCCCAftQAM^ 

1390 1400 1410 1420 1430 1440 

1440 1450 1460 1470 1480 1490 

hshipc GACGACTCTGC GGACT&CATCC CCCATGACATTTAC GTGATCGG CACCCAAGAGGACCC C 

lllllllllll I I i i I I I I I M I I I i I 1 1 I I II I I I I I lillllll Mill III 
145com GACGACTCTCXrroACTACATCCCCCAT^ 

1450 1460 1470 1480 1490 1500 

1500 1510 1520 1530 1540 1550 

hshipc CTGACnttAaAAQGAGTX^^ 

it i 1 1 1 1 1 f f 1 1 1 1 j 1 1 1 1 1 1 i mi 1 1 1 1 1 1 1 1 1 1 j 1 1 1 1 mini II 

145COm CTTOaAOAGMGGKGTOOCTXKJAOCT 

1510 1520 1530 1540 i«n iscn 
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1560 1570 1S80 1390 1CO0 1610 

hahipc ACTTTTAAAACAOTCCCCATCCAC^CGCTCTXKl^ 

ii milium nit Mini i 1 1 1 1 1 1 1 1 1 1 1 iii it 1 1 1 m 1 1 1 mill 

145com ACATTTAAAACAGTTQCCATCCACACCCTCTO 

1570 1580 1*90 1600 1610 1620 

1620 1690 1640 1650 1660 1670 

hahipc CCTX3AGCACGAGAACCGOATCAGCCACATCT^ 

ii mil inn minimi rim iimmimmnmm 11 

14 SCOttl rcAGAGCATGAGAATCGQATCAGCCATATCTGCXCTU 

1630 1640 1650 1660 1670 1680 

1680 1690 1700 1710 1720 1730 

hahipc AACACACTGQGGAACAASGGAGCCGTOGGGGTGTCGTTC ^ 1X3 l Utl AATBGAACCTCCITA 

urn inn iiiimim inn mil immiimmmmu 

145com AACACCCTGGGAAACAAGGGAGCAGTGGGAGl^^ 

1690 1700 1710 1720 1730 1740 

1740 1750 1760 1770 | 1780 1790 

hahipc (XKSTTCGTCAACTkGCCACTTCACTI^ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 1 milium n mm mi mm 

145cora GGQTTCQTCAACAGCCACT^ 

1750 1760 1770 1780 1 1790 1800 

1800 1810 1820 1830 1840 1050 

hahlpC TATATGAAf^ | | | | | | {|||lM|i II II II I I 1 I ! 1 1 I 1 
145com TATATGAACATCCrTCCGGTTCCT^ 

1810 1820 1830 1840 1850 1860 

1860 1870 , I889 1890 1900 1910 

hahipc ACTCACCGCTTC^CGCACCTCT^ 

ii 1 1 1 1 1 1 1 1 1 1 1 mmiiim 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 urn inn 

14 Scorn ACCC^CXKrTTCACCCACCTCTTCTM 

1870 I860 1890 1900 1910 1920 

1920 1930 1940 1950 i960 1970 

hahipc ACCT(XK3Jtf3GCAGAAACCATCATCCAAA 

ii imimiii it i i 1 1 1 1 1 1 ii iiiiiiii mum mini m 

14 Scom ACTTGGGAGGCAGAGGCCATCATCC 

1930 1940 1950 1960 1970 1980 

1980 1990 2000 2010 2020 2030 

hahipc TCCCACGACCAGCTGCTCACAGAGAGQAGGGAG 

immm mm mini m miiminm inn mm 

145com GCCCACGACCAACTCCTCCTOGAGAGGAAGG^ 

1990 2000 2010 2020 2030 2040 

2040 2050 2060 2070 2080 2090 

hahipc OUlGAAATCACCrrTTGCCCCAA 

inn urn ii inn inn n inn mmii mum n n 

14 5 com GAAGAGATCACCTTCGCCCCCACCTATCOATT^ 

2050 2060 2070 2080 2090 2100 

2100 2110 2120 2130 2140 2150 

hshipc TACACCAAGCAQAAAGCGACAGCX2ATGAA 

urn minimi iiimjiiiiiiiiiiimiiii mimi minm 

145com TJtfSAOGJUtfKZAOfrAACCAACAflGG^ 

2110 2120 2130 2140 21S0 ^un 
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2160 2170 2180 2190 2200 2210 

hshipc CTCTGGAAGTCTTATCCCCTCXy^ 

1 1 i ! 1 1 1 1 1 1 1 1 1 1 II llllllll Mill llllllll lllllllillllll III 

145com CTCTGGA^CTTACCCGCtWTGCAltOT 

2170 2180 2190 2200 2210 2220 

2220 2230 2240 2250 2260 2270 

hshipc ATCATQACGAGTGACCACAGCCCTGTC^ 

Ml IMI IMM MINI lllllllllll 1 1 1 III Kill lllllllllll II II 

14Scom ATCATGACGAGTQACCACW5CCCTGTCXTTC 

2230 2240 2250 2260 2270 2280 

2280 2290 2300 2310 2320 2330 

hthipc TTTGTCTCCAAGAACGGTCCCXSGC^^ 

ii iiiiiiiiiii inn ii urn ii imim inn mum 

145Com TTCGTCTCCAAGAATGGTCCTGGCACTQTAGATAGCCA 

2290 2300 2310 2320 2330 2340 

2340 2350 2360 2370 2380 2390 

hahipC TGCTATGCCACATTGAAfiACCAJUnXXXI^ 

inn mm iiiuiimuumu n nun uuiimuu m 

14 5com TCCTACGCCACACTGARGACaU^ 

2350 2360 2370 2380 2390 2400 

2400 2410 2420 2430 2440 2450 

hshipc TCXrrTGGaGA GTTTO 

urn 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 mum mum n m 

14 5com TGCTTAGAGAGTTTTGTCAAGACT 

2410 2420 . 2430 2440 2450 2460 

2460 2470 2460 2490 2500 2510 

hshipc CTGGTGOTQAAGTTTGGTGAGACTC^ 

1 1 1 1 1 1 f i mm iiiiiiiiiii iiiii uimmumuuu m 

14 Scom CTCGTCGTACGGTTT<ra^^ 

2470 2480 2490 2500 2510 2520 

2520 2530 2540 2550 2560 2570 

hshipc TACCTGCTAQAC CAGC^U^TCCTCATCAGCATCAAGTC CTCTGACJUXX3ACGAAXCCTJIT 

in i ii ilium mil mum u iiiiiiiiiii urn mm 

145com TACTThCtQ^CCJ^^^ 

2530 2540 2550 2560 2570 2580 

2580 2590 2600 2610 2620 2630 

hshipc (WCGAGGGCTGCATTGCCCTTCGGTTACau^ 

II II II II! I IMIM 1 1 1 II II III lllllll I llll II lllllllll 

14 Scom GGTGAACaSCTCCATTGCCCTTCGCTTG 

2590 2600 2610 2620 2630 2640 

2640 2650 2660 2670 2680 2690 

hshipc CCTCTCACCCACCATGGGGAGTTG^ 

immiiummim mi iiimm m mu iiiumim 

14 Scom CCTCTCACCC^CATGGGGJVGATGJiCnX» 

2650 2660 2670 2660 2690 2700 

2700 2710 2720 2730 2740 2750 

hshipc TCTCMTCGOWaACGAGC^^ 

ii 1 1 1 1 1 1 1 1 1 1 iiiuiiiuiuiuii huiiu iiii inn iiuiuii 

14 Scom TCaaJSaGCAMfrTt&OQQ^ 

2710 2720 2730 2740 2750 
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2760 2770 2780 2790 2800 2810 

hahipc AGTGaxX^UUtfftCCCTG^^ 

inn it i linn illinium inn mi in nm 

X4Scom J^GTGQAATGAAATGCTTCJ^AG 

2770 2780 2790 2800 2810 2820 

2820 2830 2840 2650 2860 2870 

hahipc JVCTAGCAGGGCCCCTCCCTGCAGTG^ 

ii mill mi i ii m mini in in n inn n mm 

145com TCTCGCAGGGTCCCTGCATGTCCTT^^ 

2830 ,2840 2850 2860 2870 2880 

2880 2890 2900 2910 2920 2930 

hahipc ATGGGAGTGGGGCCCTTT^ 

ii ii mini nm i iimim i m in mini 

145CO01 ATTGGTATGGGGCCTTTTQG XCJWKCCCTGGATGGQAAATCAACCCnn^CC^ 

2890 2900 2910 2920 2930 

2940 2950 2960 2970 2980 2990 

hBhipc GACCAGCAGCCCACAGCCTGGAOCTACGACCMCC 

ii nm i mm nm n mini nm mm i iiiimi 

145com GATCAGCAACTC^CAGCTT^^ 

2940 2950 2960 2970 2980 2990 

3000 3010 3020 3030 3040 3050 

i nm ii 14I11U n linn in ii imii ii inimii 

145COTO GGGAGGGG<XSAGGGTCC^ 

3000 30l6 3020 3030 3040 3050 

3060 3070 30*0 3090 3100 3110 

hahipc TOVOTCTCAAaiGau^ 

i i i ii iii i nm mi i ' mm n m mi n i n 

145COTO TCATCITC«CAACCAACCGAG(n^ 

3060 3070 3080 3090 3100 3110 

3120 3130 3140 3150 3160 3170 

hahipc CTGGGGAJU5AACGCASGGGAO^a3CTGCCT 

nm iii i iii i nil i i i 1 1 1 1 1 1 1 1 1 1 imnmimm 

14 Scam CTOGGAAAG- - - -GT(y5A3y5CTCl ^ lt:CAQGAGGMCtt 

3120 3130 3140 3150 3160 

3180 3190 3200 3210 3220 3230 

hahipc ATOTTTdAflAA^ 

imiimimi mimi m mnimmmm i mmn 

14 5 com JlTOTTTQAGAACCCACTCTATtWATC 

3170 3180 3190 3200 3210 3220 

3240 3250 3260 3270 3280 3290 

hshipc GACCAOGAATCCC^CJUUUWroCCGCaGAAC^^ 

ii nm n nm mi iimim imiiiim n n n n m 

145com GAGCACXSAGTCTCCCAAGATGC^ 

3230 3240 3250 3260 3270 3280 

3300 3310 3320 3330 3340 3350 

hahipc TTGTCGCCCAGCATCGTCCTCACCAXACCCX!AOGJUre 

in minimum miiiiiii mi n i i i nm i n 

145COm TCATCreCCAflC XT CCIGCT C CCCAlU^ 

3290 3300 3310 3320 3330 3340 
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3359 3360 3370 3380 3390 

hahipc AAGCAGG-"rTO CCWCO<£CCGGCTGCGCTC^^ 

it Mil u in 1 nun 1 11111111 ii 11 11 

145com iU^ACAGGCCCCTC^CCTGTCCTO 

3350 3360 3370 3380 3390 3400 

3400 3410 3420 3430 3440 34S0 

hshipc TCCTCTGCCGAGGGCAGGGCGGCCGGC^ 

II MIM 1 1 1 1 1 1 1 1 I II I MM!!! I Mill IIIIIIIMM III M I 

145com TCTrCTGCTQAGGGCAfiAATGACCAGTG^ 

34X0 3420 3430 3440 3450 3460 

3460 3470 3480 3490 3500 3510 

hshipc CTCAGCTCCCAGGCCCCGGTCTC 

1 in 11 1 1 1 inn 11111 1 111111111 1 1 1 1 1 1 1 1 1 1 1 1 1 11 inn 

145com C^CAGTTCCCAAGCCCCAGT^ 

3470 3480 3490 3500 3510 3520 

3520 3530 3540 3550 3560 3570 

hflhipc AACCAGCAGACCCCGCCC&CC 

I II 1 1 f 1 1 1 1 I MM Ml I II Mill II IIIIMMMMII M II 

14 Scorn AGCOUJCAGACAAC^CCATC 

3530 3540 3550 3560 3570 3500 

3580 3590 3600 3610 3620 3630 

hehipc GTGCTGCACCTCCAaCACTCCAAGGGCCGCGACT 

II Mill II 11 II HIM Ml I 1 1 1 1 1 1 1 1 MINIM II Mill II 

145com OTCCTGCAGCTGCAACATTCCAAAGGC 

3590 3600 3610 3620 3630 3640 

3640 3650 3660 3670 3680 3690 

hshipc cacggcaagcaccggccggaos^^ 

II Mlllllim I llllll MM II 1 1 1 1 1 1 1 It 1 1 1 1 M 1 1 1 

14 Scorn CATGGCAAGCACCGCCAAGAGGAG GGGCTGCTTpGCAGGACTQCCATGCAG 

3650 3660 3670 3680 3690 

3700 3710 3720 3730 3740 
hshipc TOAAOCCCTCJtfTGJUSC^ 

II Ml Mil I I Ml III II II II I III 

14Scom TG-AGCTGCIGCXXOTCG^ 

3700 3710 3720 3730 3740 3750 

3750 3760 3770 3780 3790 

hflhipc -TOAAGCCACT 0»-CCCTCTCCXraa3lCC^ 

II Ml II III Mill IMMHIIMI I II I II MM Mil! 

145com XSCATQCCTCTCTOMSQATOCCTCn 

3760 3770 3780 3790 3800 3810 

3800 3810 3820 3830 3840 3850 

hshipc CCTATaaaUSG CTTKTO 

I I I MIM III MM III II II I II I II I 

l45com CM6TCCCAGQCTGTGTATTTT- TTTTCAGGAAACGGCCTCACT- - -TCTCTGTG- GTCC 
3620 3830 3840 3850 3860 3870 

3860 3870 3860 3890 3900 3910 

hshipc ACTGC CTXTTCAGGCTTAGCACCAAOTQ CTGAGOCTQOAAGAAAAAC * GCACACCAGACGG 

I MM III Ml II III I III I I I M III 

14SOOQ AMKA O ro TO CI Q C XP QCK^^ 

3660 3890 3900 3910 3920 3930 
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FIGURE 13 CONT'D 



M30 3940 3950 3940 *9?0 

in imh i n i i in in i mi in ii i mi 

145com AC(X:CATAC*GACA(X*(^^ 

3940 3950 3960 3970 3960 3990 

3980 3990 4000 4010 4020 4030 

hahipc TTGAGGGCGCCATTCTQJVAGAAAGGAACTGCAGCGCCGA 

ii i m i ii i iinimiiu 

145com TTCCAGTCGCCGTTTIAAAGAAAGGA^ 

4000 4010 4-020 4030 4040 
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